Results 1  10
of
128
Model Selection and the Principle of Minimum Description Length
 Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract

Cited by 156 (5 self)
 Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate th...
Nonparametric regression using Bayesian variable selection
 Journal of Econometrics
, 1996
"... This paper estimates an additive model semiparametrically, while automatically selecting the significant independent variables and the app~opriatc power transformation of the dependent variable. The nonlinear variables arc modeled as regression splincs, with significant knots selected fiom a large ..."
Abstract

Cited by 153 (15 self)
 Add to MetaCart
This paper estimates an additive model semiparametrically, while automatically selecting the significant independent variables and the app~opriatc power transformation of the dependent variable. The nonlinear variables arc modeled as regression splincs, with significant knots selected fiom a large number of candidate knots. The estimation is made robust by modeling the errors as a mixture of normals. A Bayesian approach is used to select the significant knots, the power transformation, and to identify oatliers using the Gibbs sampler to curry out the computation. Empirical evidence is given that the sampler works well on both simulated and real examples and that in the univariate case it compares faw)rably with a kernelweighted local linear smoother, The variable selection algorithm in the paper is substantially fasler than previous Bayesian variable sclcclion algorithms. K('I ' word~': Additive nlodel, Pov¢¢r Iransformalio:l: Robust cslinlalion
Bayesian Compressive Sensing
, 2007
"... The data of interest are assumed to be represented as Ndimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basisfunction coefficients associated with B. Compressive sensing ..."
Abstract

Cited by 150 (15 self)
 Add to MetaCart
The data of interest are assumed to be represented as Ndimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basisfunction coefficients associated with B. Compressive sensing is a framework whereby one does not measure one of the aforementioned Ndimensional signals directly, but rather a set of related measurements, with the new measurements a linear combination of the original underlying Ndimensional signal. The number of required compressivesensing measurements is typically much smaller than N, offering the potential to simplify the sensing system. Let f denote the unknown underlying Ndimensional signal, and g a vector of compressivesensing measurements, then one may approximate f accurately by utilizing knowledge of the (underdetermined) linear relationship between f and g, in addition to knowledge of the fact that f is compressible in B. In this paper we employ a Bayesian formalism for estimating the underlying signal f based on compressivesensing measurements g. The proposed framework has the following properties: (i) in addition to estimating the underlying signal f, “error bars ” are also estimated, these giving a measure of confidence in the inverted signal; (ii) using knowledge of the error bars, a principled means is provided for determining when a sufficient
Adapting to unknown sparsity by controlling the false discovery rate
, 2000
"... We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the order ..."
Abstract

Cited by 122 (18 self)
 Add to MetaCart
We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the ordered entries; and controlling the ℓp norm for p small. We obtain a procedure which is asymptotically minimax for ℓr loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a dataadaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a recent innovation in simultaneous testing, in which one seeks to ensure that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q also plays a controlling role in asymptotic minimaxity. Our results say that letting q = qn → 0 with problem size n is sufficient for asymptotic minimaxity, while keeping fixed q>1/2prevents asymptotic minimaxity. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2·log ( potential model size / actual model size). We exhibit a close connection with FDRcontrolling procedures having q tending to 0; this connection strongly supports a conjecture of simultaneous asymptotic minimaxity for such model selection rules.
The practical implementation of Bayesian model selection
 Institute of Mathematical Statistics
, 2001
"... In principle, the Bayesian approach to model selection is straightforward. Prior probability distributions are used to describe the uncertainty surrounding all unknowns. After observing the data, the posterior distribution provides a coherent post data summary of the remaining uncertainty which is r ..."
Abstract

Cited by 94 (3 self)
 Add to MetaCart
In principle, the Bayesian approach to model selection is straightforward. Prior probability distributions are used to describe the uncertainty surrounding all unknowns. After observing the data, the posterior distribution provides a coherent post data summary of the remaining uncertainty which is relevant for model selection. However, the practical implementation of this approach often requires carefully tailored priors and novel posterior calculation methods. In this article, we illustrate some of the fundamental practical issues that arise for two different model selection problems: the variable selection problem for the linear model and the CART model selection problem.
Empirical Bayes Selection of Wavelet Thresholds
 ANN. STATIST
, 2005
"... This paper explores a class of empirical Bayes methods for leveldependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavytailed density. The mixing weight, or sparsity parameter, for each lev ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
This paper explores a class of empirical Bayes methods for leveldependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavytailed density. The mixing weight, or sparsity parameter, for each level of the transform is chosen by marginal maximum likelihood. If estimation
Wavelet estimators in nonparametric regression: a comparative simulation study
 Journal of Statistical Software
, 2001
"... OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. ..."
Abstract

Cited by 86 (17 self)
 Add to MetaCart
OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible.
Flexible empirical Bayes estimation for wavelets
 Journal of the Royal Statistics Society, Series B
, 2000
"... Wavelet shrinkage estimation is an increasingly popular method for signal denoising and compression. Although Bayes estimators can provide excellent mean squared error (MSE) properties, selection of an effective prior is a difficult task. To address this problem, we propose Empirical Bayes (EB) prio ..."
Abstract

Cited by 73 (13 self)
 Add to MetaCart
Wavelet shrinkage estimation is an increasingly popular method for signal denoising and compression. Although Bayes estimators can provide excellent mean squared error (MSE) properties, selection of an effective prior is a difficult task. To address this problem, we propose Empirical Bayes (EB) prior selection methods for various error distributions including the normal and the heavier tailed Student t distributions. Under such EB prior distributions, we obtain threshold shrinkage estimators based on model selection, and multiple shrinkage estimators based on model averaging. These EB estimators are seen to be computationally competitive with standard classical thresholding methods, and to be robust to outliers in both the data and wavelet domains. Simulated and real examples are used to illustrate the flexibility and improved MSE performance of these methods in a wide variety of settings.