Results 1  10
of
44
InformationTheoretic Determination of Minimax Rates of Convergence
 Ann. Stat
, 1997
"... In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence. ..."
Abstract

Cited by 158 (24 self)
 Add to MetaCart
In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
Generalization Performance of Regularization Networks and Support . . .
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2001
"... We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hy ..."
Abstract

Cited by 80 (17 self)
 Add to MetaCart
(Show Context)
We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinitedimensional unit ball in feature space into a finitedimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence, we are able to theoretically explain the effect of the choice of kernel function on the generalization performance of support vector machines.
Combining Discriminant Models with new MultiClass SVMs
, 2000
"... The idea of combining models instead of simply selecting the best one, in order to improve performance, is well known in statistics and has a long theoretical background. However, making full use of theoretical results is ordinarily subject to the satisfaction of strong hypotheses (weak correlati ..."
Abstract

Cited by 49 (10 self)
 Add to MetaCart
The idea of combining models instead of simply selecting the best one, in order to improve performance, is well known in statistics and has a long theoretical background. However, making full use of theoretical results is ordinarily subject to the satisfaction of strong hypotheses (weak correlation among the errors, availability of large training sets, possibility to rerun the training procedure an arbitrary number of times, etc.). In contrast, the practitioner who has to make a decision is frequently faced with the dicult problem of combining a given set of pretrained classiers, with highly correlated errors, using only a small training sample. Overtting is then the main risk, which cannot be overcome but with a strict complexity control of the combiner selected. This suggests that SVMs, which implement the SRM inductive principle, should be well suited for these dicult situations. Investigating this idea, we introduce a new family of multiclass SVMs and assess them as ensemble methods on a realworld problem. This task, protein secondary structure prediction, is an open problem in biocomputing for which model combination appears to be an issue of central importance. Experimental evidence highlights the gain in quality resulting from combining some of the most widely used prediction methods with our SVMs rather than with the ensemble methods traditionally used in the eld. The gain is increased when the outputs of the combiners are postprocessed with a simple DP algorithm.
Reinforcement Learning by Policy Search
, 2000
"... One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are know ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. Reinforcement learning means learning a policya mapping of observations into actionsbased on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies being searched is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate various architectures for controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multiagent system. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience reuse. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.
Learning pattern classification  A survey
 IEEE TRANS. INFORM. THEORY
, 1998
"... Classical and recent results in statistical pattern recognition and learning theory are reviewed in a twoclass pattern classification setting. This basic model best illustrates intuition and analysis techniques while still containing the essential features and serving as a prototype for many applic ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
Classical and recent results in statistical pattern recognition and learning theory are reviewed in a twoclass pattern classification setting. This basic model best illustrates intuition and analysis techniques while still containing the essential features and serving as a prototype for many applications. Topics discussed include nearest neighbor, kernel, and histogram methods, Vapnik–Chervonenkis theory, and neural networks. The presentation and the large (thogh nonexhaustive) list of references is geared to provide a useful overview of this field for both specialists and nonspecialists.
Minimax nonparametric classification  Part I: Rates of convergence

, 1998
"... This paper studies minimax aspects of nonparametric classification. We first study minimax estimation of the conditional probability of a class label, given the feature variable. This function, say f � is assumed to be in a general nonparametric class. We show the minimax rate of convergence under ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
This paper studies minimax aspects of nonparametric classification. We first study minimax estimation of the conditional probability of a class label, given the feature variable. This function, say f � is assumed to be in a general nonparametric class. We show the minimax rate of convergence under square L 2 loss is determined by the massiveness of the class as measured by metric entropy. The second part of the paper studies minimax classification. The loss of interest is the difference between the probability of misclassification of a classifier and that of the Bayes decision. As is wellknown, an upper bound on risk for estimating f gives an upper bound on the risk for classification, but the rate is known to be suboptimal for the class of monotone functions. This suggests that one does not have to estimate f well in order to classify well. However, we show that the two problems are in fact of the same difficulty in terms of rates of convergence under a sufficient condition, which is satisfied by many function classes including Besov (Sobolev), Lipschitz, and bounded variation. This is somewhat surprising in view of a result of Devroye, Györfi, and Lugosi (1996).
Mixing in turbulent jets: scalar measures and isosurface geometry
 J. Fluid Mech
, 1996
"... Experiments have been conducted to investigate mixing and the geometry of scalar isosurfaces in turbulent jets. Specifically, we have obtained highresolution, highsignaltonoiseratio images of the jetfluid concentration in the far field of round, liquidphase, turbulent jets, in the Reynolds num ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
Experiments have been conducted to investigate mixing and the geometry of scalar isosurfaces in turbulent jets. Specifically, we have obtained highresolution, highsignaltonoiseratio images of the jetfluid concentration in the far field of round, liquidphase, turbulent jets, in the Reynolds number range 4.5 x lo3 < Re < 18 x lo3, using laserinducedfluorescence imaging techniques. Analysis of these data indicates that this Reynoldsnumber range spans a mixing transition in the far field of turbulent jets. This is manifested in the probabilitydensity function of the scalar field, as well as in measures of the scalar isosurfaces. Classical as well as fractal measures of these isosurfaces have been computed, from small to large spatial scales, and are found to be functions of both scalar threshold and Reynolds number. The coverage of level sets of jetfluid concentration in the twodimensional images is found to possess a scaledependentfractal dimension that increases continuously with increasing scale, from near unity, at the smallest scales, to 2, at the largest scales. The geometry of the scalar isosurfaces is, therefore, more complex than powerlaw fractal, exhibiting an increasing complexity with increasing scale. This behaviour necessitates a scaledependent generalization of powerlawfractal geometry. A connection between scaledependentfractal geometry and the distribution of scales is established and used to compute the distribution of spatial scales in the flow. 1.
On the representation of smooth functions on the sphere using finitely many bits
 Issue 3
, 2005
"... We discuss the construction of a parsimonious representation of smooth functions on the Euclidean sphere using finitely many bits, in the sense of metric entropy. The smoothness of the functions is measured by Besov spaces. The bit representation is obtained by uniform quantization on the values o ..."
Abstract

Cited by 14 (11 self)
 Add to MetaCart
(Show Context)
We discuss the construction of a parsimonious representation of smooth functions on the Euclidean sphere using finitely many bits, in the sense of metric entropy. The smoothness of the functions is measured by Besov spaces. The bit representation is obtained by uniform quantization on the values of a polynomial operator at scattered sites on the sphere. For each cap, one can identify a certain number of bits, commensurable with the local smoothness of the target function on that cap and the volume of that cap, and obtained using the values of the polynomial operator near that cap. The polynomial operator is calculated using either spherical harmonic coefficients or, in the case of uniform approximation, values of the function at scattered sites on the sphere. The localization properties of the polynomial operator are demonstrated by a characterization of local smoothness of the target function near a point in terms of the values of these operators near the point in question. 1
Regression and Classification with Regularization
, 2002
"... The purpose of this chapter is to present a theoretical framework for the problem of learning from examples. Learning from examples can be regarded [13] as the problem of approximating a multivariate function from sparse data. The function can be real valued as in regression or binary valued as in c ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
The purpose of this chapter is to present a theoretical framework for the problem of learning from examples. Learning from examples can be regarded [13] as the problem of approximating a multivariate function from sparse data. The function can be real valued as in regression or binary valued as in classification. The problem of approximating a function from sparse data is illposed and a classical solution is regularization theory [19]. Regularization theory, as we will consider here, formulates the regression problem as a variational problem of finding the function f that minimizes the functional K (6.1) where V (; ) is a loss function (in the classical formulation the square loss was used), kfk K is a norm in a Reproducing Kernel Hilbert Space (RKHS) H de ned by the positive definite function K, ` is the number of data points or examples (the ` training pairs (x i ; y i )) and is the regularization parameter. Under rather general conditions [14, 22, ...
Part 1: Overview of the Probably Approximately Correct (PAC) Learning Framework
, 1995
"... Here we survey some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then c ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Here we survey some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then consider some criticisms of the PAC model and the extensions proposed to address these criticisms. Finally, we look briefly at other models recently proposed in computational learning theory.