#### DMCA

## Kernel matching pursuit (2002)

### Cached

### Download Links

Venue: | Machine Learning |

Citations: | 84 - 0 self |

### Citations

12893 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ... functions, boosting 1. Introduction Recently, there has been a renewed interest for kernel-based methods, due in great part to the success of the Support Vector Machine approach (Boser et al., 1992; =-=Vapnik, 1995-=-). Kernel-based learning algorithms represent the function value f(x) to be learned with a linear combination of terms of the form K(x, x i ), where x i is generally the input vector associated to one... |

2165 | Experiments with a new boosting algorithm
- Freund, Schapire
- 1996
(Show Context)
Citation Context ...ween OLS-RBF and SVMs, although their resulting functional forms are very much alike. This is one of the contributions of this paper. KMP in its basic form is also very similar to boosting algorithms =-=[4, 6]-=-, in which the chosen class of weak-learners would be the set of kernels centered on the training points. These algorithms differ mainly in the loss function they optimize, which we discuss in the sec... |

1827 | A training algorithm for optimal margin classi
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...chines, radial basis functions, boosting 1. Introduction Recently, there has been a renewed interest for kernel-based methods, due in great part to the success of the Support Vector Machine approach (=-=Boser et al., 1992-=-; Vapnik, 1995). Kernel-based learning algorithms represent the function value f(x) to be learned with a linear combination of terms of the form K(x, x i ), where x i is generally the input vector ass... |

1703 | Additive logistic regression: a statistical view of boosting. The Annals of Statistics
- Friedman, Hastie, et al.
- 2000
(Show Context)
Citation Context ...ns. The loss functions that boosting algorithms optimize are typically expressed as functions of m. Thus AdaBoost (Schapire et al., 1998) uses an exponential (e -m ) margin loss function, LogitBoost (=-=Friedman et al., 1998-=-) uses the negative binomial log-likelihood, log 2 (1 + e -2m ), whose shape is similar to a smoothed version of the soft-margin SVM loss function [1 - m]+ , and Doom II (Mason et al., 2000) approxima... |

1642 | Matching pursuit with time-frequency dictionaries
- Mallat, Zhang
- 1993
(Show Context)
Citation Context ...originally introduced in the signal-processing community as an algorithm "that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of function=-=s." (Mallat and Zhang, 1993). It-=- is a general, greedy, sparse function approximation scheme with the squared error loss, which iteratively adds new functions (i.e. basis functions) to the linear expansion. If we take as "dictio... |

942 | Greedy function approximation: a gradient boosting machine’, Annals of Statistics 29
- Friedman
- 2001
(Show Context)
Citation Context ...ady been noticed that boosting algorithms are performing a form of gradient descent in function space with respect to particular loss functions (Schapire et al., 1998; Mason et al., 2000). Following (=-=Friedman, 1999-=-), the technique can be adapted to extend the Matching Pursuit family of algorithms to optimize arbitrary differentiable loss functions, instead of doing least-squares fitting. Given a loss function L... |

884 | Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics
- Schapire, Freund, et al.
- 1998
(Show Context)
Citation Context ...ed error loss 3.1 Gradient descent in function space It has been noticed that boosting algorithms are performing a form of gradient descent in function space with respect to particular loss functions =-=[12, 9]-=-. Following [5], the technique can be adapted to extend KMP to optimize arbitrary differentiable loss functions, instead of doing least-squares fitting. Given a loss function L(yi, ˜ fn(xi)) that comp... |

618 | Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition
- PATI, REZAIIFAR, et al.
- 1993
(Show Context)
Citation Context ...timal, but so are also their # 1..n coe#cients. This can be corrected in a step often called back-fitting or back-projection and the resulting algorithm is known as Orthogonal Matching Pursuit (OMP) (=-=Pati et al., 1993-=-; Davis et al., 1994): While still choosing g n+1 as previously (equation 3), we recompute the optimal set of coe#cients # 1..n+1 at each step instead of only the last # n+1 : # (n+1) 1..n+1 = arg min... |

422 |
Orthogonal least squares learning algorithm for radial basis function networks
- Chen, Cowan, et al.
- 1991
(Show Context)
Citation Context ... radial basis functions Squared-error KMP with a Gaussian kernel and pre-fitting appears to be identical to a particular Radial Basis Functions training algorithm called Orthogonal Least Squares RBF (=-=Chen et al., 1991) (OLS-RBF-=-). In (Scholkopf et al., 1997) SVMs were compared to "classical RBFs", where the RBF centers were chosen by unsupervised k-means clustering, and SVMs gave better results. To our knowledge, h... |

284 | The relevance vector machine
- Tipping
- 2000
(Show Context)
Citation Context ...in terms of margin), but choose different support vectors than SVM, that are not necessarily close to the decision surface (as they are in SVMs). It should be noted that the Relevance Vector Machine (=-=Tipping, 2000-=-) similarly produces6 solutions in which the relevance vectors do not lie close to the border. Figure 7, where we used a simple dot-product kernel (i.e. linear decision surfaces), illustrates a proble... |

276 | Structural risk minimization over data-dependent hierarchies - Shawe-Taylor, Bartlett, et al. - 1998 |

221 | Sparse greedy matrix approximation for machine learning
- Smola, Schokopf
- 2000
(Show Context)
Citation Context ...thms developed in the machine-learning community. Connections between a related algorithm (basis pursuit (Chen, 1995)) and SVMs had already been reported in (Poggio and Girosi, 1998). More recently, (=-=Smola and Schölkopf, 2000-=-) shows connections between Matching Pursuit, Kernel-PCA, Sparse Kernel Feature analysis, and how this kind of greedy algorithm can be used to compress the design-matrix in SVMs to allow handling of h... |

181 | Inference for the generalization error
- Nadeau, Bengio
- 2003
(Show Context)
Citation Context ...ter.sTable 3: Results on 4 UCI-MLDB datasets. Again, error rates are not significantly different (values in parentheses are the p-values for the difference with SVMs, as given by the resampled t-test =-=[10]-=-), but KMPs require fewer support vectors. Dataset SVM KMP-mse KMP-tanh SVM KMP-mse KMP-tanh error error error #s.v. #s.v. #s.v. Wisc. Cancer 3.41% 3.40% (0.49) 3.49% (0.45) 42 7 21 Sonar 20.6% 21.0% ... |

178 | Comparing support vector machines with Gaussian kernels to radial basis function classifiers
- Schölkopf, Sung, et al.
- 1997
(Show Context)
Citation Context ...ror KMP with a Gaussian kernel and pre-fitting appears to be identical to a particular Radial Basis Functions training algorithm called Orthogonal Least Squares RBF (Chen et al., 1991) (OLS-RBF). In (=-=Scholkopf et al., 1997) SVMs wer-=-e compared to "classical RBFs", where the RBF centers were chosen by unsupervised k-means clustering, and SVMs gave better results. To our knowledge, however, there has been no experimental ... |

155 | Basis pursuit
- CHEN
- 1995
(Show Context)
Citation Context ...munity, but there are many interesting links with the research on kernel-based learning algorithms developed in the machine learning community. Connections between a related algorithm (basis pursuit (=-=Chen, 1995-=-)) and SVMs had already been reported in (Poggio and Girosi, 1998). More recently, Smola and Scholkopf 2000 show connections between Matching Pursuit, KernelPCA, Sparse Kernel Feature analysis, and ho... |

151 | Boosting algorithms as gradient descent
- Mason, Baxter, et al.
- 2000
(Show Context)
Citation Context ...ed error loss 3.1 Gradient descent in function space It has been noticed that boosting algorithms are performing a form of gradient descent in function space with respect to particular loss functions =-=[12, 9]-=-. Following [5], the technique can be adapted to extend KMP to optimize arbitrary differentiable loss functions, instead of doing least-squares fitting. Given a loss function L(yi, ˜ fn(xi)) that comp... |

129 | Sparse greedy gaussian process regression - Smola, Bartlett - 2000 |

98 |
Boosting the margin: A new explanation for the eectiveness of voting methods. The Annals of Statistics
- Schapire, Freund, et al.
- 1998
(Show Context)
Citation Context ...oss 3.1. Gradient descent in function space It has already been noticed that boosting algorithms are performing a form of gradient descent in function space with respect to particular loss functions (=-=Schapire et al., 1998-=-; Mason et al., 2000). Following Fiedman 1999, the technique can be adapted to extend the Matching Pursuit family of algorithms to optimize arbitrary di#erentiable loss functions, instead of doing lea... |

94 |
The perceptron { a perceiving and recognizing automaton
- Rosenblatt
- 1957
(Show Context)
Citation Context ... Gunn and Kandola 2001 who use Basis Pursuit with ANOVA kernels to obtain sparse models with improved interpretability. 4.6. Kernel matching pursuit versus kernel perceptron The perceptron algorithm (=-=Rosenblatt, 1957-=-) and extensions thereof (Gallant, 1986) are among the simplest algorithms for building linear classifiers. As it is a dot-product based algorithm, the kernel trick introduced by Aizerman et al. 1964 ... |

93 |
Adaptive time-frequency decompositions
- Davis, Mallat, et al.
- 1994
(Show Context)
Citation Context ...lso their # 1..n coe#cients. This can be corrected in a step often called back-fitting or back-projection and the resulting algorithm is known as Orthogonal Matching Pursuit (OMP) (Pati et al., 1993; =-=Davis et al., 1994-=-): While still choosing g n+1 as previously (equation 3), we recompute the optimal set of coe#cients # 1..n+1 at each step instead of only the last # n+1 : # (n+1) 1..n+1 = arg min (#1..n+1 #IR n+1 ) ... |

79 | Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension
- Floyd, Warmuth
(Show Context)
Citation Context ...th for the computational efficiency of the resulting representation, and for its theoretical and practical influence on generalization performance (see (Graepel, Herbrich and Shawe-Taylor, 2000) and (=-=Floyd and Warmuth, 1995-=-)). However the sparsity of the solutions found by the SVM algorithm is hardly controllable, and often these solutions are not very sparse. Our research started as a search for a flexible alternative ... |

64 | M.K.: Relating data compression and learnability - Littlestone, Warmuth - 1986 |

44 | A sparse representation for function approximation
- Poggio, Girosi
- 1998
(Show Context)
Citation Context ...e research on kernel-based learning algorithms developed in the machine learning community. Connections between a related algorithm (basis pursuit (Chen, 1995)) and SVMs had already been reported in (=-=Poggio and Girosi, 1998-=-). More recently, Smola and Scholkopf 2000 show connections between Matching Pursuit, KernelPCA, Sparse Kernel Feature analysis, and how this kind of greedy algorithm can be used to compress the desig... |

42 |
Optimal Linear Discriminants
- Gallant
- 1986
(Show Context)
Citation Context ...it with ANOVA kernels to obtain sparse models with improved interpretability. 4.6. Kernel matching pursuit versus kernel perceptron The perceptron algorithm (Rosenblatt, 1957) and extensions thereof (=-=Gallant, 1986-=-) are among the simplest algorithms for building linear classifiers. As it is a dot-product based algorithm, the kernel trick introduced by Aizerman et al. 1964 readily applies, allowing a straightfor... |

38 | Semiparametric support vector and linear programming machines - Smola, Frie, et al. - 1998 |

26 | Comparing support vector machines with gaussian kernels to radial basis function classifiers
- Bernhard, Sung, et al.
- 1997
(Show Context)
Citation Context ... indirect and hardly controllable influence on sparsity. Squared-error KMP with a Gaussian kernel and pre-backfitting is identical to Orthogonal Least Squares Radial Basis Functions [2] (OLS-RBF). In =-=[13]-=- SVMs were compared to “classical RBFs”, where the RBF centers were chosen by unsupervised k-means clustering, and SVMs gave better results. To our knowledge, however, there has been no experimental c... |

24 |
Relating Data Compression and Learnability , Unpublished manuscript
- Littlestone, Warmuth
- 1986
(Show Context)
Citation Context ...of support vectors chosen by the algorithm. This implies that current theoretical results on generalization bounds that are derived for sparse SVM or Perceptron solutions (Vapnik, 1995; Vapnik, 1998; =-=Littlestone and Warmuth, 1986-=-; Floyd and Warmuth, 1995; Graepel et al., 2000) cannot be readily applied to KMP. On the other hand, KMP solutions may require less support vectors than Kernel Perceptron 7 A comparison with regressi... |

19 | Schapire: 1996, ‘Experiments with a new Boosting algorithm - Freund, E |

11 | Density estimation using support vector machines - Weston, Gammerman, et al. - 1999 |

8 | Leveraged vector machines - Singer |

5 | 2000, ‘Generalization error bounds for sparse linear classifiers - Graepel, Herbrich, et al. |

3 |
Assessing learning procedures using DELVE
- Hinton, Neal, et al.
- 1995
(Show Context)
Citation Context ...8 18 5.3 Benchmark datasets We did some further experiments, on 5 well-known datasets from the the UCI Machine Learning Databases, using Gaussian kernels. A first series of experiments used the Delve =-=[7]-=- system on the Mushrooms dataset. Hyper-parameters (the σ of the kernel, the bound constraint C for SVM and the number of support points for KMP) were chosen automatically using K-fold cross validatio... |

2 | Structural modelling with sparse kernels, Machine Learning 48(115–136 - Gunn, Kandola - 2002 |

2 | Bengio: 2000, ‘Inference for the Generalization Error - Nadeau, Y |

1 | Schapire: 1998, `Large margin classification using the perceptron algorithm - Freund, E |

1 | Frean: 2000, `Boosting Algorithms as Gradient Descent - Mason, Baxter, et al. |

1 | The DELVE Manual'. DELVE can be found at http://www.cs.toronto.edu/delve - Rasmussen, Neal, et al. - 1996 |

1 | Reitz Lecture, February 24 - IMS - 1999 |