Results 1  10
of
100
Compressed sensing
 IEEE Trans. Inform. Theory
"... Abstract—Suppose is an unknown vector in (a digital image or signal); we plan to measure general linear functionals of and then reconstruct. If is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measureme ..."
Abstract

Cited by 1730 (18 self)
 Add to MetaCart
Abstract—Suppose is an unknown vector in (a digital image or signal); we plan to measure general linear functionals of and then reconstruct. If is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements can be dramatically smaller than the size. Thus, certain natural classes of images with pixels need only = ( 1 4 log 5 2 ()) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual pixel samples. More specifically, suppose has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)—so the coefficients belong to an ball for 0 1. The most important coefficients in that expansion allow reconstruction with 2 error ( 1 2 1
For Most Large Underdetermined Systems of Linear Equations the Minimal ℓ1norm Solution is also the Sparsest Solution
 Comm. Pure Appl. Math
, 2004
"... We consider linear equations y = Φα where y is a given vector in R n, Φ is a given n by m matrix with n < m ≤ An, and we wish to solve for α ∈ R m. We suppose that the columns of Φ are normalized to unit ℓ 2 norm 1 and we place uniform measure on such Φ. We prove the existence of ρ = ρ(A) so that fo ..."
Abstract

Cited by 342 (9 self)
 Add to MetaCart
We consider linear equations y = Φα where y is a given vector in R n, Φ is a given n by m matrix with n < m ≤ An, and we wish to solve for α ∈ R m. We suppose that the columns of Φ are normalized to unit ℓ 2 norm 1 and we place uniform measure on such Φ. We prove the existence of ρ = ρ(A) so that for large n, and for all Φ’s except a negligible fraction, the following property holds: For every y having a representation y = Φα0 by a coefficient vector α0 ∈ R m with fewer than ρ · n nonzeros, the solution α1 of the ℓ 1 minimization problem min �x�1 subject to Φα = y is unique and equal to α0. In contrast, heuristic attempts to sparsely solve such systems – greedy algorithms and thresholding – perform poorly in this challenging setting. The techniques include the use of random proportional embeddings and almostspherical sections in Banach space theory, and deviation bounds for the eigenvalues of random Wishart matrices.
PEGASUS: A policy search method for large MDPs and POMDPs
 In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence
, 2000
"... We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an "equivalent" POMDP ..."
Abstract

Cited by 207 (7 self)
 Add to MetaCart
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an "equivalent" POMDP in which all state transitions (given the current state and action) are deterministic. This reduces the general problem of policy search to one in which we need only consider POMDPs with deterministic transitions. We give a natural way of estimating the value of all policies in these transformed POMDPs. Policy search is then simply performed by searching for a policy with high estimated value. We also establish conditions under which our value estimates will be good, recovering theoretical results similar to those of Kearns, Mansour and Ng [7], but with "sample complexity" bounds that have only a polynomial rather than exponential dependence on the horizon time. Our method appl...
Learning nearoptimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path
 MACHINE LEARNING JOURNAL (2008) 71:89129
, 2008
"... ..."
Convergence rates of posterior distributions
 Ann. Statist
, 2000
"... We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinitedimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, logspline models, D ..."
Abstract

Cited by 43 (11 self)
 Add to MetaCart
We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinitedimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, logspline models, Dirichlet processes and interval censoring. 1. Introduction. Suppose
Efficient Estimation for the Cox Model with Interval Censoring
 Annals of Statistics
, 1996
"... The maximum likelihood estimator (MLE) for the proportional hazards model with current status data is studied. It is shown that the MLE for the regression parameter is asymptotically normal with p nconvergence rate and achieves the information bound, even though the MLE for the baseline cumulativ ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
The maximum likelihood estimator (MLE) for the proportional hazards model with current status data is studied. It is shown that the MLE for the regression parameter is asymptotically normal with p nconvergence rate and achieves the information bound, even though the MLE for the baseline cumulative hazard function only converges at n 1=3 rate. Estimation of the asymptotic variance matrix for the MLE of the regression parameter is also considered. To prove our main results, we also establish a general theorem showing that the MLE of the finite dimensional parameter in a class of semiparametric models is asymptotically efficient even though the MLE of the infinite dimensional parameter converges at a rate slower than p n. The results are illustrated by applying them to a data set from a tumoriginicity study. 1. Introduction In many survival analysis problems, we are interested in the relationship between a failure time T and a vector of covariates Z. However, it is common that obs...
Projection estimation in multiple regression with application to functional ANOVA models
 Ann. Statist
, 1998
"... A general theory on rates of convergence of the leastsquares projection estimate in multiple regression is developed. The theory is applied to the functional ANOVA model, where the multivariate regression function is modeled as a specified sum of a constant term, main effects Žfunctions of one vari ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
A general theory on rates of convergence of the leastsquares projection estimate in multiple regression is developed. The theory is applied to the functional ANOVA model, where the multivariate regression function is modeled as a specified sum of a constant term, main effects Žfunctions of one variable. and selected interaction terms Žfunctions of two or more variables.. The leastsquares projection is onto an approximating space constructed from arbitrary linear spaces of functions and their tensor products respecting the assumed ANOVA structure of the regression function. The linear spaces that serve as building blocks can be any of the ones commonly used in practice: polynomials, trigonometric polynomials, splines, wavelets and finite elements. The rate of convergence result that is obtained reinforces the intuition that loworder ANOVA modeling can achieve dimension reduction and thus overcome the curse of dimensionality. Moreover, the components of the projection estimate in an appropriately defined ANOVA decomposition provide consistent estimates of the corresponding components of the regression function. When the regression function does not satisfy the assumed ANOVA form, the projection estimate converges to its best approximation of that form.
Modulation Estimators and Confidence Sets
 ANN. STATIST
, 1999
"... An unknown signal plus white noise is observed at n discrete time points. Within a large convex class of linear estimators of , we choose the estimator b that minimizes estimated quadratic risk. By construction, b is nonlinear. This estimation is done after orthogonal transformation of the data to ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
An unknown signal plus white noise is observed at n discrete time points. Within a large convex class of linear estimators of , we choose the estimator b that minimizes estimated quadratic risk. By construction, b is nonlinear. This estimation is done after orthogonal transformation of the data to a reasonable coordinate system. The procedure adaptively tapers the coefficients of the transformed data. If the class of candidate estimators satisfies a uniform entropy condition, then b is asymptotically minimax in Pinsker's sense over certain ellipsoids in the parameter space and shares one such asymptotic minimax property with the JamesStein estimator. We describe computational algorithms for b and construct confidence sets for the unknown signal. These confidence sets are centered at b , have correct asymptotic coverage probability, and have relatively small risk as setvalued estimators of .
Sample Complexity for Learning Recurrent Perceptron Mappings
 IEEE Trans. Inform. Theory
, 1996
"... Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to e ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. Keywords: perceptrons, recurrent models, neural networks, learning, VapnikChervonenkis dimension 1 Introduction One of the most popular approaches to binary pattern classification, underlying many statistical techniques, is based on perceptrons or linear discriminants ; see for instance the classical reference [9]. In this context, one is interested in classifying kdimensional input patterns v = (v 1 ; : : : ; v k ) into two disjoint classes A + and A \Gamma . A perceptron P which classifies vectors into A + and A \Gamma is characterized by a vector (of "weights") ~c 2 R k , and operates as follows. One forms the inner product ~c:v = c 1 v 1 + : : : c k v k . I...
Agnostic PACLearning of Functions on Analog Neural Nets
 Neural Computation
, 1994
"... . We consider learning on multilayer neural nets with piecewise polynomial activation functions and a fixed number k of numerical inputs. We exhibit arbitrarily large network architectures for which efficient and provably successful learning algorithms exist in the rather realistic refinement of Va ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
. We consider learning on multilayer neural nets with piecewise polynomial activation functions and a fixed number k of numerical inputs. We exhibit arbitrarily large network architectures for which efficient and provably successful learning algorithms exist in the rather realistic refinement of Valiant's model for probably approximately correct learning ("PAClearning") where no apriori assumptions are required about the "target function" (agnostic learning), arbitrary noise is permitted in the training sample, and the target outputs as well as the network outputs may be arbitrary reals. The number of computation steps of the learning algorithm LEARN that we construct is bounded by a polynomial in the bitlength n of the fixed number of input variables, in the bound s for the allowed bitlength of weights, in 1 " , where " is some arbitrary given bound for the true error of the neural net after training, and in 1 ffi where ffi is some arbitrary given bound for the probability t...