Results 1  10
of
16
A Direct Method for Building Sparse Kernel Learning Algorithms
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Many kernel learning algorithms, including support vector machines, result in a kernel machine, such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Many kernel learning algorithms, including support vector machines, result in a kernel machine, such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a convex optimization problem. Based on this fact we present a direct method to build sparse kernel learning algorithms by adding one more constraint to the original convex optimization problem, such that the sparseness of the resulting kernel machine is explicitly controlled while at the same time performance is kept as high as possible. A gradient based approach is provided to solve this modified optimization problem. Applying
Online (and Offline) on an Even Tighter Budget
"... We develop a fast online kernel algorithm for classification which can be viewed as an improvement over the one suggested by (Crammer, Kandola and Singer, 2004), titled "Online Classificaton on a Budget". In that previous work, the authors introduced an onthefly compression of the number of ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
We develop a fast online kernel algorithm for classification which can be viewed as an improvement over the one suggested by (Crammer, Kandola and Singer, 2004), titled "Online Classificaton on a Budget". In that previous work, the authors introduced an onthefly compression of the number of examples used in the prediction function using the size of the margin as a quality measure. Although displaying impressive results on relatively noisefree data we show how their algorithm is susceptible in noisy problems. Utilizing
Building sparse large margin classifiers
 In: 22nd International Conf. on Machine learning, ACM Press
, 2005
"... This paper presents an approach to build Sparse Large Margin Classifiers (SLMC) by adding one more constraint to the standard Support Vector Machine (SVM) training problem. The added constraint explicitly controls the sparseness of the classifier and an approach is provided to solve the formulated p ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
This paper presents an approach to build Sparse Large Margin Classifiers (SLMC) by adding one more constraint to the standard Support Vector Machine (SVM) training problem. The added constraint explicitly controls the sparseness of the classifier and an approach is provided to solve the formulated problem. When considering the dual of this problem, it can be seen that building an SLMC is equivalent to constructing an SVM with a modified kernel function. Further analysis of this kernel function indicates that the proposed approach essentially finds a discriminating subspace that can be spanned by a small number of vectors, and in this subspace different classes of data are linearly well separated. Experimental results over several classification benchmarks show that in most cases the proposed approach outperforms the stateofart sparse learning algorithms. 1.
Sparse Regression as a Sparse Eigenvalue Problem
"... Abstract — We extend the l0norm “subspectral ” algorithms developed for sparseLDA [5] and sparsePCA [6] to more general quadratic costs such as MSE in linear (or kernel) regression. The resulting ”Sparse Least Squares ” (SLS) problem is also NPhard, by way of its equivalence to a rank1 sparse e ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Abstract — We extend the l0norm “subspectral ” algorithms developed for sparseLDA [5] and sparsePCA [6] to more general quadratic costs such as MSE in linear (or kernel) regression. The resulting ”Sparse Least Squares ” (SLS) problem is also NPhard, by way of its equivalence to a rank1 sparse eigenvalue problem. Specifically, for minimizing general quadratic cost functions we use a highlyefficient method for direct eigenvalue computation based on partitioned matrix inverse techniques that leads to ×10 3 speedups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n 4) complexity that limited the previous algorithms ’ utility for highdimensional problems. Moreover, the new computation prioritizes the role of the lessmyopic backward elimination stage which becomes even more efficient than forward selection. Similarly, branchandbound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix techniques. Our Greedy Sparse Least Squares (GSLS) algorithm generalizes Natarajan’s algorithm [9] also known as OrderRecursive Matching Pursuit (ORMP). Specifically, the forward pass of GSLS is exactly equivalent to ORMP but is more efficient, and by including the backward pass, which only doubles the computation, we can achieve a lower MSE than ORMP. In experimental comparisons with LARS [3], forwardGSLS is shown to be not only more efficient and accurate but more flexible in terms of choice of regularization. I.
Sparse basis selection: New results and application to adaptive prediction of video source traffic
 IEEE Transactions on Networks
, 2005
"... Abstract—Realtime prediction of video source traffic is an important step in many network management tasks such as dynamic bandwidth allocation and endtoend qualityofservice (QoS) control strategies. In this paper, an adaptive prediction model for MPEGcoded traffic is developed. A novel techno ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract—Realtime prediction of video source traffic is an important step in many network management tasks such as dynamic bandwidth allocation and endtoend qualityofservice (QoS) control strategies. In this paper, an adaptive prediction model for MPEGcoded traffic is developed. A novel technology is used, first developed in the signal processing community, called sparse basis selection. It is based on selecting a small subset of inputs (basis) from among a large dictionary of possible inputs. A new sparse basis selection algorithm is developed that is based on efficiently updating the input selection adaptively. When a new measurement is received, the proposed algorithm updates the selected inputs in a recursive manner. Thus, adaptability is not only in the weight adjustment, but also in the dynamic update of the inputs. The algorithm is applied to the problem of singlestepahead prediction of MPEGcoded video source traffic, and the developed method achieves improved results, as compared to the published results in the literature. The present analysis indicates that the adaptive feature of the developed algorithm seems to add significant overall value. Index Terms—Internet traffic, MPEG, sparse basis, sparse representation, video traffic prediction.
Greedy forward selection algorithms to sparse Gaussian Process Regression
 In Proceedings of 2006 International Joint Conference on Neural Networks (IJCNN 2006
, 2006
"... Abstract — This paper considers the basis vector selection issue invloved in forward selection algorithms to sparse Gaussian Process Regression (GPR). Firstly, we reexamine a previous basis vector selection criterion proposed by Smola and Bartlett [20], referred as losssmola and give some new form ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract — This paper considers the basis vector selection issue invloved in forward selection algorithms to sparse Gaussian Process Regression (GPR). Firstly, we reexamine a previous basis vector selection criterion proposed by Smola and Bartlett [20], referred as losssmola and give some new formulae to implement this criterion for the fullgreedy strategy more efficiently in O(n 2 kmax) time instead of the original O(n 2 k 2 max), where n is the number of training examples and kmax ≪ n is the maximally allowed number of selected basis vectors. Secondly, in order to make the algorithm linearly scaling in n, which is quite preferable for large datasets, we present an approximate version losssun to losssmola criterion. We compare the full greedy algorithms induced by the losssun and losssmola criteria, respectively, on several mediumscale datasets. In contrast to losssmola, the advantage associated with losssun criterion is that it could lead to an algorithm which scales as O(nk 2 max) time and O(nkmax) memory if coupled with the subgreedy scheme [20], [7]. Our criterion is similar to a matching pursuit approach, referred as losskeert proposed very recently by Keerthi and Chu [7] but with different motivations. Numerical experiments on a number of largescale datasets have demonstrated that our proposed method is always better than losskeert in both generalization performance and running time. Finally, we discuss the drawbacks of the subgreedy strategy and present two approximate fullgreedy strategies, which can be applied to all three basis vector selection criteria discussed in this paper. I.
A hybrid hmmbased speech recognizer using kernelbased discriminants as acoustic models
 In 18th International Conference on Pattern Recognition (ICPR 2006), 2024 August 2006, Hong Kong
, 2006
"... In this paper we propose a novel orderrecursive training algorithm for kernelbased discriminants which is computationally efficient. We integrate this method in a hybrid HMMbased speech recognition system by translating the outputs of the kernelbased classifier into classconditional probabiliti ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In this paper we propose a novel orderrecursive training algorithm for kernelbased discriminants which is computationally efficient. We integrate this method in a hybrid HMMbased speech recognition system by translating the outputs of the kernelbased classifier into classconditional probabilities and using them instead of Gaussian mixtures as production probabilities of a HMMbased decoder for speech recognition. The performance of the described hybrid structure is demonstrated on the DARPA Resource Management (RM1) corpus.
Updates for nonlinear discriminants
 In 20th International Joint Conference on Artificial Intelligence (IJCAI 2007
, 2007
"... A novel training algorithm for nonlinear discriminants for classification and regression in Reproducing Kernel Hilbert Spaces (RKHSs) is presented. It is shown how the overdetermined linear leastsquaresproblem in the corresponding RKHS may be solved within a greedy forward selection scheme by updat ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
A novel training algorithm for nonlinear discriminants for classification and regression in Reproducing Kernel Hilbert Spaces (RKHSs) is presented. It is shown how the overdetermined linear leastsquaresproblem in the corresponding RKHS may be solved within a greedy forward selection scheme by updating the pseudoinverse in an orderrecursive way. The described construction of the pseudoinverse gives rise to an update of the orthogonal decomposition of the reduced Gram matrix in linear time. Regularization in the spirit of Ridge regression may then easily be applied in the orthogonal space. Various experiments for both classification and regression are performed to show the competitiveness of the proposed method. 1
Constructing Orthogonal Latent Features for Arbitrary Loss
"... Summary. A boosting framework for constructing orthogonal features targeted to a given loss function is developed. Combined with techniques from spectral methods such as PCA and PLS, an orthogonal boosting algorithm for linear hypothesis is used to efficiently construct orthogonal latent features se ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Summary. A boosting framework for constructing orthogonal features targeted to a given loss function is developed. Combined with techniques from spectral methods such as PCA and PLS, an orthogonal boosting algorithm for linear hypothesis is used to efficiently construct orthogonal latent features selected to optimize the given loss function. The method is generalized to construct orthogonal nonlinear features using the kernel trick. The resulting method, Boosted Latent Features (BLF) is demonstrated to both construct valuable orthogonal features and to be a competitive inference method for a variety of loss functions. For the least squared loss, BLF reduces to the PLS algorithm and preserves all the attractive properties of that algorithm. As in PCA and PLS, the resulting nonlinear features are valuable for visualization, dimensionality reduction, improving generalization by regularization, and use in other learning algorithms, but now these features can be targeted to a specific inference task/loss function. The data matrix is factorized by the extracted features. The lowrank approximation of the data matrix provides efficiency and stability in computation, an attractive characteristic of PLStype methods. Computational results demonstrate the effectiveness of the approach on a wide range of classification and regression problems. 1
A Gradientbased Forward Greedy Algorithm for Sparse Gaussian Process Regression
 In Trends in Neural Computation
, 2006
"... Abstract In this chaper, we present a gradientbased forward greedy method for sparse approximation of Bayesian Gaussian Process Regression (GPR) model. Different from previous work, which is mostly based on various basis vector selection strategies, we propose to construct instead of select a new b ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract In this chaper, we present a gradientbased forward greedy method for sparse approximation of Bayesian Gaussian Process Regression (GPR) model. Different from previous work, which is mostly based on various basis vector selection strategies, we propose to construct instead of select a new basis vector at each iterative step. This idea was motivated from the wellknown gradient boosting approach. The resulting algorithm built on gradientbased optimisation packages incurs similar computational cost and memory requirements to other leading sparse GPR algorithms. Moreover, the proposed work is a general framework which can be extended to deal with other popular kernel machines, including Kernel Logistic Regression (KLR) and Support Vector Machines (SVMs). Numerical experiments on a wide range of datasets are presented to demonstrate the superiority of our algorithm in terms of generalisation performance.