Results 1  10
of
22
Use of the ZeroNorm With Linear Models and Kernel Methods
, 2002
"... We explore the use of the socalled zeronorm of the parameters of linear models in learning. ..."
Abstract

Cited by 171 (3 self)
 Add to MetaCart
We explore the use of the socalled zeronorm of the parameters of linear models in learning.
Proximal support vector machine classifiers
 Proceedings KDD2001: Knowledge Discovery and Data Mining
, 2001
"... Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the ..."
Abstract

Cited by 152 (16 self)
 Add to MetaCart
(Show Context)
Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set. Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem. Classification by proximity to two distinct nonlinear surfaces generated by a nonlinear kernel also leads to two simple generalized eigenvalue problems. The effectiveness of the proposed method is demonstrated by tests on simple examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation time and test set correctness. Index Terms—Support vector machines, proximal classification, generalized eigenvalues. 1
Multiverse recommendation: ndimensional tensor factorization for contextaware collaborative filtering
 In Proceedings of the fourth ACM conference on Recommender systems
, 2010
"... Context has been recognized as an important factor to consider in personalized Recommender Systems. However, most modelbased Collaborative Filtering approaches such as Matrix Factorization do not provide a straightforward way of integrating context information into the model. In this work, we intro ..."
Abstract

Cited by 71 (4 self)
 Add to MetaCart
(Show Context)
Context has been recognized as an important factor to consider in personalized Recommender Systems. However, most modelbased Collaborative Filtering approaches such as Matrix Factorization do not provide a straightforward way of integrating context information into the model. In this work, we introduce a Collaborative Filtering method based on Tensor Factorization, a generalization of Matrix Factorization that allows for a flexible and generic integration of contextual information by modeling the data as a UserItemContext Ndimensional tensor instead of the traditional 2D UserItem matrix. In the proposed model, called Multiverse Recommendation, different types of context are considered as additional dimensions in the representation of the data as a tensor. The factorization of this tensor leads to a compact model of the data which can be used to provide contextaware recommendations. We provide an algorithm to address the Ndimensional factorization, and show that the Multiverse Recommendation improves upon noncontextual Matrix Factorization up to 30 % in terms of the Mean Absolute Error (MAE). We also compare to two stateoftheart contextaware methods and show that Tensor Factorization consistently outperforms them both in semisynthetic and realworld data – improvements range from 2.5 % to more than 12 % depending on the data. Noticeably, our approach outperforms other methods by a wider margin whenever more contextual information is available.
Nonparametric quantile estimation
, 2006
"... In regression, the desired estimate of yx is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of yx, will be below the estimate. For τ = 0.5 this is an estimate of the median. What ..."
Abstract

Cited by 52 (9 self)
 Add to MetaCart
In regression, the desired estimate of yx is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of yx, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. 1.
Training Support Vector Machine using Adaptive Clustering
 in Proc. of the 4th SIAM International Conference on Data Mining, Lake Buena
, 2004
"... Training support vector machines involves a huge optimization problem and many specially designed algorithms have been proposed. In this paper, we proposed an algorithm called ClusterSVM that accelerates the training process by exploiting the distributional properties of the training data, that is, ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
Training support vector machines involves a huge optimization problem and many specially designed algorithms have been proposed. In this paper, we proposed an algorithm called ClusterSVM that accelerates the training process by exploiting the distributional properties of the training data, that is, the natural clustering of the training data and the overall layout of these clusters relative to the decision boundary of support vector machines. The proposed algorithm first partitions the training data into several pairwise disjoint clusters. Then, the representatives of these clusters are used to train an initial support vector machine, based on which we can approximately identify the support vectors and nonsupport vectors. After replacing the cluster containing only nonsupport vectors with its representative, the number of training data can be significantly reduced, thereby speeding up the training process. The proposed ClusterSVM has been tested against the popular training algorithm SMO on both the artificial data and the real data, and a significant speedup was observed. The complexity of ClusterSVM scales with the square of the number of support vectors and, after a further improvement, it is expected that it will scale with square of the number of nonboundary support vectors.
Sparse metric learning via smooth optimization
 In
, 2009
"... In this paper we study the problem of learning a lowrank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixednorm regularization which is nonconvex. We then ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
(Show Context)
In this paper we study the problem of learning a lowrank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixednorm regularization which is nonconvex. We then show that it can be equivalently formulated as a convex saddle (minmax) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets. 1
Multiclass Classification with MultiPrototype Support Vector Machines
 Journal of Machine Learning Research
, 2005
"... Winnertakeall multiclass classifiers are built on the top of a set of prototypes each representing one of the available classes. A pattern is then classified with the label associated to the most `similar' prototype. Recent proposal of SVM extensions to multiclass can be considered instance ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Winnertakeall multiclass classifiers are built on the top of a set of prototypes each representing one of the available classes. A pattern is then classified with the label associated to the most `similar' prototype. Recent proposal of SVM extensions to multiclass can be considered instances of the same strategy with one prototype per class.
Simpler knowledgebased support vector machines
 In Proceedings of the TwentyThird International Conference on Machine Learning
, 2006
"... If appropriately used, prior knowledge can significantly improve the predictive accuracy of learning algorithms or reduce the amount of training data needed. In this paper we introduce a simple method to incorporate prior knowledge in support vector machines by modifying the hypothesis space rathe ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
If appropriately used, prior knowledge can significantly improve the predictive accuracy of learning algorithms or reduce the amount of training data needed. In this paper we introduce a simple method to incorporate prior knowledge in support vector machines by modifying the hypothesis space rather than the optimization problem. The optimization problem is amenable to solution by the constrained concave convex procedure, which finds a local optimum. The paper discusses different kinds of prior knowledge and demonstrates the applicability of the approach in some characteristic experiments. 1.
Minimum Reference Set Based Feature Selection for Small Sample Classifications
"... We address feature selection problems for classification of small samples and high dimensionality. A practical example is microarraybased cancer classification problems, where sample size is typically less than 100 and number of features is several thousands or higher. One of the commonly used meth ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We address feature selection problems for classification of small samples and high dimensionality. A practical example is microarraybased cancer classification problems, where sample size is typically less than 100 and number of features is several thousands or higher. One of the commonly used methods in addressing this problem is recursive feature elimination (RFE) method, which utilizes the generalization capability embedded in support vector machines and is thus suitable for small samples problems. We propose a novel method using minimum reference set (MRS) generated by the nearest neighbor rule. MRS is the set of minimum number of samples that correctly classify all the training samples. It is related to structural risk minimization principle and thus leads to good generalization. The proposed MRS based method is compared to RFE method with several real datasets, and experimental results show that the MRS method produces better classification performance. 1.
MODEL BUILDING WITH LIKELIHOOD BASIS PURSUIT
, 2004
"... We consider a nonparametric penalized likelihood approach for model building called likelihood basis pursuit (LBP) that determines the probabilities of binary outcomes given explanatory vectors while automatically selecting important features. The LBP model involves parameters that balance the com ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider a nonparametric penalized likelihood approach for model building called likelihood basis pursuit (LBP) that determines the probabilities of binary outcomes given explanatory vectors while automatically selecting important features. The LBP model involves parameters that balance the competing goals of maximizing the loglikelihood and minimizing the penalized basis pursuit terms. These parameters are selected to minimize a proxy of misclassification error, namely, the randomized, generalized approximate cross validation (ranGACV) function. The ranGACV function is not easily represented in compact form; its functional values can only be obtained by solving two instances of the LBP model, which may be computationally expensive. A grid search is typically used to find appropriate parameters, requiring the solutions to hundreds or thousands of instances of the LBP model. Since only parameters (data) are changed between solves, the resulting problem is a nonlinear slice model in the parameter space. We show how slicemodeling techniques significantly improve the efficiency of individual solves and thus speedup the grid search. In addition, we consider using derivativefree optimization algorithms for parameter selection, replacing the grid search. We show how, by seeding the derivativefree algorithms with a coarse grid search, these algorithms can find better solutions with fewer function evaluations. Our interest in this area comes directly from the seminal work that Olvi and his collaborators have carried out designing and applying optimization techniques to problems in machine learning and data mining.