Results 1  10
of
299
Regularized discriminant analysis
 J. Amer. Statist. Assoc
, 1989
"... Linear and quadratic discriminant analysis are considered in the small sample highdimensional setting. Alternatives to the usual maximum likelihood (plugin) estimates for the covariance matrices are proposed. These alternatives are characterized by two parameters, the values of which are customize ..."
Abstract

Cited by 310 (2 self)
 Add to MetaCart
Linear and quadratic discriminant analysis are considered in the small sample highdimensional setting. Alternatives to the usual maximum likelihood (plugin) estimates for the covariance matrices are proposed. These alternatives are characterized by two parameters, the values of which are customized to individual situations by jointly minimizing a sample based estimate of future misclassification risk. Computationally fast implementations are presented, and the efficacy of the approach is examined through simulation studies and application to data. These studies indicate that in many circumstances dramatic gains in classification accuracy can be achieved. Submitted to Journal of the American Statistical Association
Improving Text Classification by Shrinkage in a Hierarchy of Classes
, 1998
"... When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. ..."
Abstract

Cited by 239 (5 self)
 Add to MetaCart
When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples.
Informationtheoretic metric learning
 in NIPS 2006 Workshop on Learning to Compare Examples
, 2007
"... We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a lowrank kernel learning problem. Spe ..."
Abstract

Cited by 161 (14 self)
 Add to MetaCart
We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a lowrank kernel learning problem. Specifically, we minimize the Burg divergence of a lowrank kernel to an input kernel, subject to pairwise distance constraints. Our approach has several advantages over existing methods. First, we present a natural informationtheoretic formulation for the problem. Second, the algorithm utilizes the methods developed by Kulis et al. [6], which do not involve any eigenvector computation; in particular, the running time of our method is faster than most existing techniques. Third, the formulation offers insights into connections between metric learning and kernel learning. 1
Prior distributions for variance parameters in hierarchical models
 Bayesian Analysis
, 2006
"... Various noninformative prior distributions have been suggested for scale parameters in hierarchical models. We construct a new foldednoncentralt family of conditionally conjugate priors for hierarchical standard deviation parameters, and then consider noninformative and weakly informative priors i ..."
Abstract

Cited by 149 (13 self)
 Add to MetaCart
Various noninformative prior distributions have been suggested for scale parameters in hierarchical models. We construct a new foldednoncentralt family of conditionally conjugate priors for hierarchical standard deviation parameters, and then consider noninformative and weakly informative priors in this family. We use an example to illustrate serious problems with the inversegamma family of “noninformative ” prior distributions. We suggest instead to use a uniform prior on the hierarchical standard deviation, using the halft family when the number of groups is small and in other settings where a weakly informative prior is desired.
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
Adaptive wavelet estimation: A block thresholding and oracle inequality approach
 Ann. Statist
, 1999
"... We study wavelet function estimation via the approach of block thresholding and ideal adaptation with oracle. Oracle inequalities are derived and serve as guides for the selection of smoothing parameters. Based on an oracle inequality and motivated by the data compression and localization properties ..."
Abstract

Cited by 100 (14 self)
 Add to MetaCart
We study wavelet function estimation via the approach of block thresholding and ideal adaptation with oracle. Oracle inequalities are derived and serve as guides for the selection of smoothing parameters. Based on an oracle inequality and motivated by the data compression and localization properties of wavelets, an adaptive wavelet estimator for nonparametric regression is proposed and the optimality of the procedure is investigated. We show that the estimator achieves simultaneously three objectives: adaptivity, spatial adaptivity and computational efficiency. Specifically, it is proved that the estimator attains the exact optimal rates of convergence over a range of Besov classes and the estimator achieves adaptive local minimax rate for estimating functions at a point. The estimator is easy to implement, at the computational cost of O�n�. Simulation shows that the estimator has excellent numerical performance relative to more traditional wavelet estimators. 1. Introduction. Wavelet
ForWaRD: FourierWavelet Regularized Deconvolution for IllConditioned Systems
 IEEE Trans. on Signal Processing
, 2002
"... We propose an efficient, hybrid FourierWavelet Regularized Deconvolution (ForWaRD) al gorithm that performs noise regularization via scalar shrinkage in both the Fourier and wavelet domains. The Fourier shrinkage exploits the Fourier transform's sparse representation of the colored noise i ..."
Abstract

Cited by 95 (2 self)
 Add to MetaCart
We propose an efficient, hybrid FourierWavelet Regularized Deconvolution (ForWaRD) al gorithm that performs noise regularization via scalar shrinkage in both the Fourier and wavelet domains. The Fourier shrinkage exploits the Fourier transform's sparse representation of the colored noise inherent in deconvolution, while the wavelet shrinkage exploits the wavelet do main's sparse representation of piecewise smooth signals and images. We derive the optimal balance between the amount of Fourier and wavelet regularization by optimizing an approxi mate meansquarederror (MSE) metric and find that signals with sparser wavelet representa tions require less Fourier shrinkage. ForWaRD is applicable to all illconditioned deconvolution problems, unlike the purely waveletbased Wavelet Vaguelette Deconvolution (WVD), and its es timate features minimal ringing, unlike purely Fourierbased Wiener deconvolution. We analyze ForWaRD's MSE decay rate as the number of samples increases and demonstrate its improved performance compared to the optimal WVD over a wide range of practical samplelengths.
Regularized estimation of large covariance matrices
 Ann. Statist
, 2008
"... This paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as (log p)/n → ..."
Abstract

Cited by 92 (13 self)
 Add to MetaCart
This paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as (log p)/n → 0, and obtain explicit rates. The results are uniform over some fairly natural wellconditioned families of covariance matrices. We also introduce an analogue of the Gaussian white noise model and show that if the population covariance is embeddable in that model and wellconditioned, then the banded approximations produce consistent estimates of the eigenvalues and associated eigenvectors of the covariance matrix. The results can be extended to smooth versions of banding and to nonGaussian distributions with sufficiently short tails. A resampling approach is proposed for choosing the banding parameter in practice. This approach is illustrated numerically on both simulated and real data. 1. Introduction. Estimation
Random Cascades on Wavelet Trees and Their Use in Analyzing and Modeling Natural Images
 Applied and Computational Harmonic Analysis
, 2001
"... in signal and image processing, including image denoising, coding, and superresolution. # 2001 Academic Press 1. INTRODUCTION Stochastic models of natural images underlie a variety of applications in image processing and lowlevel computer vision, including image coding, denoising and 1 MW supp ..."
Abstract

Cited by 88 (15 self)
 Add to MetaCart
in signal and image processing, including image denoising, coding, and superresolution. # 2001 Academic Press 1. INTRODUCTION Stochastic models of natural images underlie a variety of applications in image processing and lowlevel computer vision, including image coding, denoising and 1 MW supported by NSERC 1967 fellowship; AW and MW by AFOSR Grant F496209810349 and ONR Grant N0001491J1004. Address correspondence to MW. 2 ES supported by NSF Career Grant MIP9796040 and an Alfred P. Sloan fellowship. 89 10635203/01 $35.00 Copyright # 2001 by Academic Press All rights of reproduction in any form reserved. 90 WAINWRIGHT, SIMONCELLI, AND WILLSKY restoration, interpolation and synthesis. Accordingly, the past decade has witnessed an increasing amount of research devoted to developing stochastic models of images (e.g., [19, 38, 45, 48, 55]). Simultaneously, wavel
Risk reduction in large portfolios: Why imposing the wrong constraints helps
, 2002
"... Green and Hollifield (1992) argue that the presence of a dominant factor is why we observe extreme negative weights in meanvarianceefficient portfolios constructed using sample moments. In that case imposing noshortsale constraints should hurt whereas empirical evidence is often to the contrary. ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
Green and Hollifield (1992) argue that the presence of a dominant factor is why we observe extreme negative weights in meanvarianceefficient portfolios constructed using sample moments. In that case imposing noshortsale constraints should hurt whereas empirical evidence is often to the contrary. We reconcile this apparent contradiction. We explain why constraining portfolio weights to be nonnegative can reduce the risk in estimated optimal portfolios even when the constraints are wrong. Surprisingly, with noshortsale constraints in place, the sample covariance matrix performs as well as covariance matrix estimates based on factor models, shrinkage estimators, and daily data.