### Citations

2781 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...lgorithm 1). Theorem 1. Without sub-sampling, Algorithm 1 is equivalent to Algorithm 2 of [8]. 2 In their book, Smola and Schölkopf redefine their kernel approximation in the same way as we have done =-=[5]-=-, however they do not make the connection that it is a compression scheme (see Claim 1). 3 We do not use the “59-trick” in our algorithm – although it’s inclusion would be trivial and would result in ... |

1642 | Matching pursuit with time-frequency dictionaries
- Mallat, Zhang
- 1993
(Show Context)
Citation Context ...thms. 1 Introduction Matching pursuit refers to a family of algorithms that generate a set of bases for learning in a greedy fashion. A good example of this approach is the matching pursuit algorithm =-=[4]-=-. Viewed from this angle sparse kernel principal components analysis (PCA) looks for a small number of kernel basis vectors in order to maximise the Rayleigh quotient. The algorithm was proposed by [8... |

1222 |
Kernel Methods for Pattern Analysis
- Shawe-Taylor, Cristianini
- 2004
(Show Context)
Citation Context ...valued functions then the sample compression scheme is said to be a classification or regression algorithm, respectively. 2.1 Sparse kernel principal components analysis Principal components analysis =-=[6]-=- can be expressed as the following maximisation problem: w ′ X ′ Xw max w w ′ , (1) w where w is the weight vector. In a sparse KPCA algorithm we would like to find a sparsely represented vector w = X... |

1133 |
On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ...sis vectors by i. The indices of i are chosen from the training sample S and we denote Si to be those samples indexed by the vector i. Given these definitions and the bound of Vapnik and Chervonenkis =-=[9]-=- we can upper bound the true loss of KMP as follows. Theorem 3. Fix α ∈ R, α > 0. Let A be the regression algorithm of KMP, m the size of the training set S and k the size of the chosen basis vectors ... |

411 | Using the Nyström method to speed up kernel machines
- Williams, Seeger
- 2001
(Show Context)
Citation Context ...projection into the space defined solely by the chosen indices is enough to reconstruct the kernel matrix and does not require any extra information. 2 The projection is the well known Nyström method =-=[11]-=-. An orthogonal projection Pi(φ(xj)) of a feature vector φ(xj) into a subspace defined only by the set of indices i can be expressed as: Pi(xj) = ˜ X ′ ( ˜ X ˜ X ′ ) −1 ˜ Xφ(xj), where ˜ X = X[i, :] a... |

221 | Sparse greedy matrix approximation for machine learning
- Smola, Schokopf
- 2000
(Show Context)
Citation Context ...4]. Viewed from this angle sparse kernel principal components analysis (PCA) looks for a small number of kernel basis vectors in order to maximise the Rayleigh quotient. The algorithm was proposed by =-=[8]-=- 1 and motivated by matching pursuit [4], but to our knowledge sparse PCA has not been analysed theoretically. In this paper we show that sparse PCA (KPCA) is a sample compression scheme and can be bo... |

84 | Kernel matching pursuit
- Vincent, Bengio
(Show Context)
Citation Context ...he KPCA subspace. We also derive a more general framework for this algorithm that uses the principle “maximise Rayleigh quotient and deflate”. A related algorithm called kernel matching pursuit (KMP) =-=[10]-=- is a sparse version of least squares regression but without the property of being a compression scheme. However, we use the number of basis vectors constructed by KMP to help upper bound the loss of ... |

79 | Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension
- Floyd, Warmuth
(Show Context)
Citation Context ...t to our knowledge sparse PCA has not been analysed theoretically. In this paper we show that sparse PCA (KPCA) is a sample compression scheme and can be bounded using the size of the compression set =-=[3, 2]-=- which is the set of training examples used in the construction of the KPCA subspace. We also derive a more general framework for this algorithm that uses the principle “maximise Rayleigh quotient and... |

64 | M.K.: Relating data compression and learnability
- Littlestone, Warmuth
- 1986
(Show Context)
Citation Context ...t to our knowledge sparse PCA has not been analysed theoretically. In this paper we show that sparse PCA (KPCA) is a sample compression scheme and can be bounded using the size of the compression set =-=[3, 2]-=- which is the set of training examples used in the construction of the KPCA subspace. We also derive a more general framework for this algorithm that uses the principle “maximise Rayleigh quotient and... |

38 |
On the eigenspectrum of the gram matrix and the generalization error of kernel-pca. Information Theory
- Shawe-Taylor, Williams, et al.
- 2005
(Show Context)
Citation Context ...ernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al =-=[7]-=- and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does no... |

1 | Partitioning points by parallel planes
- Anthony
(Show Context)
Citation Context ...bound { } ( ) k+1 Pr ¯S ¯ : ℓ Sα(f) 4e(m − k − t) = 0, ℓα(f) > ɛ ≤ 2 2 k + 1 −ɛ(m−k−t)/2 , where we have made use of a bound on the number of dichotomies that can be generated by parallel hyperplanes =-=[1]-=-, which is ∑k+d−1 ( ) ( ) k+d−1 md−1 e(md−1) i=0 i which is ≤ k+d−1 , where d is the number of parallel hyperplanes and equals 2 in our case. We now need to consider all of the ways that the 6k basis... |