Results 1  10
of
31
Generalization Performance of Regularization Networks and Support . . .
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2001
"... We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hy ..."
Abstract

Cited by 73 (20 self)
 Add to MetaCart
We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinitedimensional unit ball in feature space into a finitedimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence, we are able to theoretically explain the effect of the choice of kernel function on the generalization performance of support vector machines.
Algorithms and Representations for Reinforcement Learning
, 2005
"... “If we knew what it was we were doing, it would not be called research, would it?” ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
“If we knew what it was we were doing, it would not be called research, would it?”
Binetcauchy kernels on dynamical systems and its application to the analysis of dynamic scenes
 International Journal of Computer Vision
, 2005
"... Abstract. We derive a family of kernels on dynamical systems by applying the BinetCauchy theorem to trajectories of states. Our derivation provides a unifying framework for all kernels on dynamical systems currently used in machine learning, including kernels derived from the behavioral framework, ..."
Abstract

Cited by 32 (12 self)
 Add to MetaCart
Abstract. We derive a family of kernels on dynamical systems by applying the BinetCauchy theorem to trajectories of states. Our derivation provides a unifying framework for all kernels on dynamical systems currently used in machine learning, including kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. In the case of linear timeinvariant systems, we derive explicit formulae for computing the proposed BinetCauchy kernels by solving Sylvester equations, and relate the proposed kernels to existing kernels based on cepstrum coefficients and subspace angles. Besides their theoretical appeal, these kernels can be used efficiently in the comparison of video sequences of dynamic scenes that can be modeled as the output of a linear timeinvariant dynamical system. One advantage of our kernels is that they take the initial conditions of the dynamical systems into account. As a first example, we use our kernels to compare video sequences of dynamic textures. As a second example, we apply our kernels to the problem of clustering short clips of a movie. Experimental evidence shows superior performance of our kernels. Keywords: BinetCauchy theorem, ARMA models and dynamical systems, Sylvester
Random Approximation in Numerical Analysis
 Proceedings of the Conference "Functional Analysis" Essen
, 1994
"... this paper is twofold. In the first part (sections 2  6) I want to give a survey on recent developments of Monte Carlo complexity. This will include techniques to derive sharp lower bounds as well as the construction of concrete numerical methods which attain these optimal bounds. The field covered ..."
Abstract

Cited by 29 (22 self)
 Add to MetaCart
this paper is twofold. In the first part (sections 2  6) I want to give a survey on recent developments of Monte Carlo complexity. This will include techniques to derive sharp lower bounds as well as the construction of concrete numerical methods which attain these optimal bounds. The field covered here lies at the frontiers of several disciplines, among them theoretical computer science, numerical analysis, probability theory, approximation theory and to a large extent functional analysis. I want to stress the latter aspect and show how new techniques from Banach space and operator theory can be applied to Monte Carlo complexity. In the second part I want to present new results  the solution to a problem concering the Monte Carlo complexity of Fredholm integral equations. This will demonstrate in detail the general approach outlined in part one. We develop a new, fast algorithm  it is a combination of Monte Carlo methods with the Galerkin technique, an approach which seems to be new to this field. The basis functions used for the Galerkin discretization are orthogonal splines of minimal smoothness. They lead to an implementable procedure of minimal computational cost. The paper is organized as follows. In section 2, the main notions of informationbased complexity theory are explained. We cover both the deterministic and the stochastic setting in detail, also for the sake of later comparisons. Some relations to snumber theory are presented in section 3. The role of the average case in proofs of lower bounds for Monte Carlo methods is explained in Section 4. In the following three sections, we analyse the complexity of basic numerical problems: Section 5 deals with numerical integration and contains classical results on the complexity of Monte Carlo quadrature, toge...
From Margin To Sparsity
 In Advances in Neural Information Processing Systems 13
, 2001
"... We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PACstyle generalisation error bound for the classifier learned by the dual perceptron learning algorithm. The bound value cruci ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PACstyle generalisation error bound for the classifier learned by the dual perceptron learning algorithm. The bound value crucially depends on the margin a support vector machine would achieve on the same data set using the same kernel. Ironically, the bound yields better guarantees than are currently available for the support vector solution itself. 1 Introduction In the last few years there has been a large controversy about the significance of the attained margin, i.e. the smallest real valued output of a classifiers before thresholding, as an indicator of generalisation performance. Results in the VC, PAC and luckiness frameworks seem to indicate that a large margin is a prerequisite for small generalisation error bounds (see [13, 11]). These results caused many researchers to focus on large margin method...
Covering numbers for support vector machines
 IEEE Trans. Inform. Theory
, 2002
"... Abstract—Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization performance of SV machines (within Valiant’s probably approximately correct framework) took no ac ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Abstract—Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization performance of SV machines (within Valiant’s probably approximately correct framework) took no account of the kernel used except in its effect on the margin and radius. More recently, it has been shown that one can bound the relevant covering numbers using tools from functional analysis. In this paper, we show that the resulting bound can be greatly simplified. The new bound involves the eigenvalues of the integral operator induced by the kernel. It shows that the effective dimension depends on the rate of decay of these eigenvalues. We present an explicit calculation of covering numbers for an SV machine using a Gaussian kernel, which is significantly better than that implied by previous results. Index Terms—Covering numbers, entropy numbers, kernel machines, statistical learning theory, support vector (SV) machines. I.
Entropy Numbers, Operators and Support Vector Kernels
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... We derive new bounds for the generalization error of feature space machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs are based on a viewpoint that is apparently novel in the field of statistical learning theory ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We derive new bounds for the generalization error of feature space machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs are based on a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinite dimensional unit ball in feature space into a finite dimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence we are able to theoretically explain the effect of the choice of kernel functions on the generalization performance of support vector machines.
Counterexamples for Boundedness of Pseudodifferential Operators
, 2002
"... The KohnNirenberg correspondence assigns to a symbol #(x, #) in the space of tempered distributions S ) the operator #(X, D) : ) defined by #(x, #) f(#) e 2#ix# d# . This is the classical version of pseudodi#erential operators that is used in the investigation of partial diff ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
The KohnNirenberg correspondence assigns to a symbol #(x, #) in the space of tempered distributions S ) the operator #(X, D) : ) defined by #(x, #) f(#) e 2#ix# d# . This is the classical version of pseudodi#erential operators that is used in the investigation of partial differential operators, cf. [21]. In the language of physics, the KohnNirenberg correspondence and its relatives such as the Weyl correspondence are methods of quantization. In the language of engineering, they are timevarying filters. The KohnNirenberg correspondence is usually analyzed using methods from hard analysis. The problems arising from the theory of partial differential equations suggest using the classical Hormander symbol classes S #,# (R ), which are defined in terms of di#erentiability conditions [21], [31]. On the other hand, if we introduce the timefrequency shifts M # T x f(t) = e 2#i#t f(t x) , (1) then we can write #(X, D) as a formal superposition o
Regularization in Kernel Learning
, 2008
"... Under mild assumptions on the kernel, we obtain the best known error rates in a regularized learning scenario taking place in the corresponding reproducing kernel Hilbert space. The main novelty in the analysis is a proof that one can use a regularization term that grows significantly slower than th ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Under mild assumptions on the kernel, we obtain the best known error rates in a regularized learning scenario taking place in the corresponding reproducing kernel Hilbert space. The main novelty in the analysis is a proof that one can use a regularization term that grows significantly slower than the standard quadratic growth in the RKHS norm. 1
Some Limiting Embeddings in Weighted Function Spaces and Related Entropy Numbers
, 1997
"... The paper deals with weighted function spaces of type B s p;q (R n ; w(x)) and F s p;q (R n ; w(x)), where w(x) is a weight function of at most polynomial growth. Of special interest are weight functions of type w(x) = (1 + jxj 2 ) ff=2 (log(2 + jxj)) with ff 0 and 2 R. Our main resu ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
The paper deals with weighted function spaces of type B s p;q (R n ; w(x)) and F s p;q (R n ; w(x)), where w(x) is a weight function of at most polynomial growth. Of special interest are weight functions of type w(x) = (1 + jxj 2 ) ff=2 (log(2 + jxj)) with ff 0 and 2 R. Our main result deals with estimates for the entropy numbers of compact embeddings between spaces of this type; more precisely, we may extend and tighten some of our previous results in [12]. AMS Subject Classification: 46E 35 Key Words: weighted function spaces, compact embeddings, entropy numbers Introduction 1 Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 Weighted embeddings  the nonlimiting case 3 2 Limiting embeddings, entropy numbers 7 2.1 Estimates from above, an approach via duality arguments . . . . . . . . . . . . . . 8 2.2 Estimates from above, an approach via approximation numbers . . . . . . . . . . . 15 2.3 Estimates...