Results 1  10
of
21
Sparse Geometric Image Representations with Bandelets
, 2004
"... This paper introduces a new class of bases, called bandelet bases, which decompose the image along multiscale vectors that are elongated in the direction of a geometric flow. This geometric flow indicates directions in which the image grey levels have regular variations. The image decomposition in ..."
Abstract

Cited by 148 (4 self)
 Add to MetaCart
This paper introduces a new class of bases, called bandelet bases, which decompose the image along multiscale vectors that are elongated in the direction of a geometric flow. This geometric flow indicates directions in which the image grey levels have regular variations. The image decomposition in a bandelet basis is implemented with a fast subband filtering algorithm. Bandelet bases lead to optimal approximation rates for geometrically regular images. For image compression and noise removal applications, the geometric flow is optimized with fast algorithms, so that the resulting bandelet basis produces a minimum distortion. Comparisons are made with wavelet image compression and noise removal algorithms.
Empirical margin distributions and bounding the generalization error of combined classifiers
 Ann. Statist
, 2002
"... Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such ..."
Abstract

Cited by 112 (8 self)
 Add to MetaCart
Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of ℓ1norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Lévy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.
Convergence rates of posterior distributions
 Ann. Statist
, 2000
"... We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinitedimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, logspline models, D ..."
Abstract

Cited by 43 (11 self)
 Add to MetaCart
We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinitedimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, logspline models, Dirichlet processes and interval censoring. 1. Introduction. Suppose
Rademacher Processes And Bounding The Risk Of Function Learning
 High Dimensional Probability II
, 1999
"... We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific propertie ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
We construct data dependent upper bounds on the risk in function learning problems. The bounds are based on the local norms of the Rademacher process indexed by the underlying function class and they do not require prior knowledge about the distribution of training examples or any specific properties of the function class. Using Talagrand's type concentration inequalities for empirical and Rademacher processes, we show that the bounds hold with high probability that decreases exponentially fast when the sample size grows. In typical situations that are frequently encountered in the theory of function learning, the bounds give nearly optimal rate of convergence of the risk to zero. 1. Local Rademacher norms and bounds on the risk: main results Let (S; A) be a measurable space and let F be a class of Ameasurable functions from S into [0; 1]: Denote P(S) the set of all probability measures on (S; A): Let f 0 2 F be an unknown target function. Given a probability measure P 2 P(S) (also unknown), let (X 1 ; : : : ; Xn ) be an i.i.d. sample in (S; A) with common distribution P (defined on a probability space(\Omega ; \Sigma; P)). In computer learning theory, the problem of estimating f 0 ; based on the labeled sample (X 1 ; Y 1 ); : : : ; (Xn ; Yn ); where Y j := f 0 (X j ); j = 1; : : : ; n; is referred to as function learning problem. The so called concept learning is a special case of function learning. In this case, F := fI C : C 2 Cg; where C ae A is called a class of concepts (see Vapnik (1998), Vidyasagar (1996), Devroye, Gyorfi and Lugosi (1996) for the account on statistical learning theory). The goal of function learning is to find an estimate
The Mixture Approach to Universal Model Selection
 ECOLE NORMALE SUPERIEURE
, 1997
"... We build a model selection algorithm which has a mean Kullback risk upper bounded by the mean risk for the best model applied to half the sample augmented by an explicit penalty term of order one over the sample size. This estimator is not a "true" selection rule, but instead an adaptive convex comb ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
We build a model selection algorithm which has a mean Kullback risk upper bounded by the mean risk for the best model applied to half the sample augmented by an explicit penalty term of order one over the sample size. This estimator is not a "true" selection rule, but instead an adaptive convex combination of the models: this mixture approach is inspired by the context tree weighting method of information theory by Willems, Shtarkov and Tjalkens [6, 7], which is a "universal" data compression algorithm for stationary binary sources. Our algorithm is "universal" with respect to the statistical mean Kullback risk in the sense that it is almost optimal for any exchangeable sample distribution. We give an application to the estimation of a probability measure by adaptive histograms. We detail a factorised computation of the mixture which weights M models performing a number of operations of order log(M ), when the subdivisions underlying the histograms have nested cells. The end of the p...
Modern statistical estimation via oracle inequalities
, 2006
"... A number of fundamental results in modern statistical theory involve thresholding estimators. This survey paper aims at reconstructing the history of how thresholding rules came to be popular in statistics and describing, in a not overly technical way, the domain of their application. Two notions pl ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
A number of fundamental results in modern statistical theory involve thresholding estimators. This survey paper aims at reconstructing the history of how thresholding rules came to be popular in statistics and describing, in a not overly technical way, the domain of their application. Two notions play a fundamental role in our narrative: sparsity and oracle inequalities. Sparsity is a property of the object to estimate, which seems to be characteristic of many modern problems, in statistics as well as applied mathematics and theoretical computer science, to name a few. ‘Oracle inequalities’ are a powerful decisiontheoretic tool which has served to understand the optimality of thresholding rules, but which has many other potential applications, some of which we will discuss. Our story is also the story of the dialogue between statistics and applied harmonic analysis. Starting with the work of Wiener, we will see that certain representations emerge as being optimal for estimation. A leitmotif throughout
GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2007
"... Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean µ of Y by model selection. More precisely, we start with a collection S = {Sm, m ∈ M} of linear subspaces of R n and associate to each of these the leastsquares ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean µ of Y by model selection. More precisely, we start with a collection S = {Sm, m ∈ M} of linear subspaces of R n and associate to each of these the leastsquares estimator of µ on Sm. Then, we use a data driven penalized criterion in order to select one estimator among these. Our first objective is to analyze the performance of estimators associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second objective is to propose better penalties that are versatile enough to take into account both the complexity of the collection S and the sample size. Then we apply those to solve various statistical problems such as variable selection, change point detections and signal estimation among others. Our results are based on a nonasymptotic risk bound with respect to the Euclidean loss for the selected estimator. Some analogous results are also established for the Kullback loss.
Penalized nonparametric mean square estimation of the coefficients of diffusion processes
 Bernoulli
, 2007
"... diffusion processes. ..."