Results 1 - 10
of
15
A Direct Method for Building Sparse Kernel Learning Algorithms
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Many kernel learning algorithms, including support vector machines, result in a kernel machine, such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Many kernel learning algorithms, including support vector machines, result in a kernel machine, such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a convex optimization problem. Based on this fact we present a direct method to build sparse kernel learning algorithms by adding one more constraint to the original convex optimization problem, such that the sparseness of the resulting kernel machine is explicitly controlled while at the same time performance is kept as high as possible. A gradient based approach is provided to solve this modified optimization problem. Applying
Online (and Offline) on an Even Tighter Budget
"... We develop a fast online kernel algorithm for classification which can be viewed as an improvement over the one suggested by (Crammer, Kandola and Singer, 2004), titled "Online Classificaton on a Budget". In that previous work, the authors introduced an on-the-fly compression of the number of ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We develop a fast online kernel algorithm for classification which can be viewed as an improvement over the one suggested by (Crammer, Kandola and Singer, 2004), titled "Online Classificaton on a Budget". In that previous work, the authors introduced an on-the-fly compression of the number of examples used in the prediction function using the size of the margin as a quality measure. Although displaying impressive results on relatively noise-free data we show how their algorithm is susceptible in noisy problems. Utilizing
Building sparse large margin classifiers
- In: 22nd International Conf. on Machine learning, ACM Press
, 2005
"... This paper presents an approach to build Sparse Large Margin Classifiers (SLMC) by adding one more constraint to the standard Support Vector Machine (SVM) training problem. The added constraint explicitly controls the sparseness of the classifier and an approach is provided to solve the formulated p ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This paper presents an approach to build Sparse Large Margin Classifiers (SLMC) by adding one more constraint to the standard Support Vector Machine (SVM) training problem. The added constraint explicitly controls the sparseness of the classifier and an approach is provided to solve the formulated problem. When considering the dual of this problem, it can be seen that building an SLMC is equivalent to constructing an SVM with a modified kernel function. Further analysis of this kernel function indicates that the proposed approach essentially finds a discriminating subspace that can be spanned by a small number of vectors, and in this subspace different classes of data are linearly well separated. Experimental results over several classification benchmarks show that in most cases the proposed approach outperforms the state-of-art sparse learning algorithms. 1.
Sparse Regression as a Sparse Eigenvalue Problem
"... Abstract — We extend the l0-norm “subspectral ” algorithms developed for sparse-LDA [5] and sparse-PCA [6] to more general quadratic costs such as MSE in linear (or kernel) regression. The resulting ”Sparse Least Squares ” (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse e ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract — We extend the l0-norm “subspectral ” algorithms developed for sparse-LDA [5] and sparse-PCA [6] to more general quadratic costs such as MSE in linear (or kernel) regression. The resulting ”Sparse Least Squares ” (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem. Specifically, for minimizing general quadratic cost functions we use a highly-efficient method for direct eigenvalue computation based on partitioned matrix inverse techniques that leads to ×10 3 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n 4) complexity that limited the previous algorithms ’ utility for high-dimensional problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes even more efficient than forward selection. Similarly, branch-andbound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix techniques. Our Greedy Sparse Least Squares (GSLS) algorithm generalizes Natarajan’s algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward pass of GSLS is exactly equivalent to ORMP but is more efficient, and by including the backward pass, which only doubles the computation, we can achieve a lower MSE than ORMP. In experimental comparisons with LARS [3], forward-GSLS is shown to be not only more efficient and accurate but more flexible in terms of choice of regularization. I.
Sparse basis selection: New results and application to adaptive prediction of video source traffic
- IEEE Transactions on Networks
, 2005
"... Abstract—Real-time prediction of video source traffic is an important step in many network management tasks such as dynamic bandwidth allocation and end-to-end quality-of-service (QoS) control strategies. In this paper, an adaptive prediction model for MPEG-coded traffic is developed. A novel techno ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—Real-time prediction of video source traffic is an important step in many network management tasks such as dynamic bandwidth allocation and end-to-end quality-of-service (QoS) control strategies. In this paper, an adaptive prediction model for MPEG-coded traffic is developed. A novel technology is used, first developed in the signal processing community, called sparse basis selection. It is based on selecting a small subset of inputs (basis) from among a large dictionary of possible inputs. A new sparse basis selection algorithm is developed that is based on efficiently updating the input selection adaptively. When a new measurement is received, the proposed algorithm updates the selected inputs in a recursive manner. Thus, adaptability is not only in the weight adjustment, but also in the dynamic update of the inputs. The algorithm is applied to the problem of single-step-ahead prediction of MPEG-coded video source traffic, and the developed method achieves improved results, as compared to the published results in the literature. The present analysis indicates that the adaptive feature of the developed algorithm seems to add significant overall value. Index Terms—Internet traffic, MPEG, sparse basis, sparse representation, video traffic prediction.
Greedy forward selection algorithms to sparse Gaussian Process Regression
- In Proceedings of 2006 International Joint Conference on Neural Networks (IJCNN 2006
, 2006
"... Abstract — This paper considers the basis vector selection issue invloved in forward selection algorithms to sparse Gaussian Process Regression (GPR). Firstly, we re-examine a previous basis vector selection criterion proposed by Smola and Bartlett [20], referred as loss-smola and give some new form ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract — This paper considers the basis vector selection issue invloved in forward selection algorithms to sparse Gaussian Process Regression (GPR). Firstly, we re-examine a previous basis vector selection criterion proposed by Smola and Bartlett [20], referred as loss-smola and give some new formulae to implement this criterion for the full-greedy strategy more efficiently in O(n 2 kmax) time instead of the original O(n 2 k 2 max), where n is the number of training examples and kmax ≪ n is the maximally allowed number of selected basis vectors. Secondly, in order to make the algorithm linearly scaling in n, which is quite preferable for large datasets, we present an approximate version loss-sun to loss-smola criterion. We compare the full greedy algorithms induced by the loss-sun and loss-smola criteria, respectively, on several medium-scale datasets. In contrast to loss-smola, the advantage associated with loss-sun criterion is that it could lead to an algorithm which scales as O(nk 2 max) time and O(nkmax) memory if coupled with the sub-greedy scheme [20], [7]. Our criterion is similar to a matching pursuit approach, referred as loss-keert proposed very recently by Keerthi and Chu [7] but with different motivations. Numerical experiments on a number of large-scale datasets have demonstrated that our proposed method is always better than loss-keert in both generalization performance and running time. Finally, we discuss the drawbacks of the sub-greedy strategy and present two approximate full-greedy strategies, which can be applied to all three basis vector selection criteria discussed in this paper. I.
A hybrid hmm-based speech recognizer using kernel-based discriminants as acoustic models
- In 18th International Conference on Pattern Recognition (ICPR 2006), 20-24 August 2006, Hong Kong
, 2006
"... In this paper we propose a novel order-recursive training algorithm for kernel-based discriminants which is computationally efficient. We integrate this method in a hybrid HMM-based speech recognition system by translating the outputs of the kernel-based classifier into class-conditional probabiliti ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper we propose a novel order-recursive training algorithm for kernel-based discriminants which is computationally efficient. We integrate this method in a hybrid HMM-based speech recognition system by translating the outputs of the kernel-based classifier into class-conditional probabilities and using them instead of Gaussian mixtures as production probabilities of a HMM-based decoder for speech recognition. The performance of the described hybrid structure is demonstrated on the DARPA Resource Management (RM1) corpus.
Updates for nonlinear discriminants
- In 20th International Joint Conference on Artificial Intelligence (IJCAI 2007
, 2007
"... A novel training algorithm for nonlinear discriminants for classification and regression in Reproducing Kernel Hilbert Spaces (RKHSs) is presented. It is shown how the overdetermined linear leastsquares-problem in the corresponding RKHS may be solved within a greedy forward selection scheme by updat ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
A novel training algorithm for nonlinear discriminants for classification and regression in Reproducing Kernel Hilbert Spaces (RKHSs) is presented. It is shown how the overdetermined linear leastsquares-problem in the corresponding RKHS may be solved within a greedy forward selection scheme by updating the pseudoinverse in an order-recursive way. The described construction of the pseudoinverse gives rise to an update of the orthogonal decomposition of the reduced Gram matrix in linear time. Regularization in the spirit of Ridge regression may then easily be applied in the orthogonal space. Various experiments for both classification and regression are performed to show the competitiveness of the proposed method. 1
Constructing Orthogonal Latent Features for Arbitrary Loss
"... Summary. A boosting framework for constructing orthogonal features targeted to a given loss function is developed. Combined with techniques from spectral methods such as PCA and PLS, an orthogonal boosting algorithm for linear hypothesis is used to efficiently construct orthogonal latent features se ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Summary. A boosting framework for constructing orthogonal features targeted to a given loss function is developed. Combined with techniques from spectral methods such as PCA and PLS, an orthogonal boosting algorithm for linear hypothesis is used to efficiently construct orthogonal latent features selected to optimize the given loss function. The method is generalized to construct orthogonal nonlinear features using the kernel trick. The resulting method, Boosted Latent Features (BLF) is demonstrated to both construct valuable orthogonal features and to be a competitive inference method for a variety of loss functions. For the least squared loss, BLF reduces to the PLS algorithm and preserves all the attractive properties of that algorithm. As in PCA and PLS, the resulting nonlinear features are valuable for visualization, dimensionality reduction, improving generalization by regularization, and use in other learning algorithms, but now these features can be targeted to a specific inference task/loss function. The data matrix is factorized by the extracted features. The low-rank approximation of the data matrix provides efficiency and stability in computation, an attractive characteristic of PLS-type methods. Computational results demonstrate the effectiveness of the approach on a wide range of classification and regression problems. 1
A Gradient-based Forward Greedy Algorithm for Sparse Gaussian Process Regression
- In Trends in Neural Computation
, 2006
"... Abstract In this chaper, we present a gradient-based forward greedy method for sparse approximation of Bayesian Gaussian Process Regression (GPR) model. Different from previous work, which is mostly based on various basis vector selection strategies, we propose to construct instead of select a new b ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract In this chaper, we present a gradient-based forward greedy method for sparse approximation of Bayesian Gaussian Process Regression (GPR) model. Different from previous work, which is mostly based on various basis vector selection strategies, we propose to construct instead of select a new basis vector at each iterative step. This idea was motivated from the well-known gradient boosting approach. The resulting algorithm built on gradient-based optimisation packages incurs similar computational cost and memory requirements to other leading sparse GPR algorithms. Moreover, the proposed work is a general framework which can be extended to deal with other popular kernel machines, including Kernel Logistic Regression (KLR) and Support Vector Machines (SVMs). Numerical experiments on a wide range of datasets are presented to demonstrate the superiority of our algorithm in terms of generalisation performance.

