## Minimum Message Length Shrinkage Estimation

### BibTeX

@MISC{Makalic_minimummessage,

author = {Enes Makalic and Daniel F. Schmidt},

title = {Minimum Message Length Shrinkage Estimation},

year = {}

}

### OpenURL

### Abstract

This note considers estimation of the mean of a multivariate Gaussian distribution with known variance within the Minimum Message Length (MML) framework. Interestingly, the resulting MML estimator exactly coincides with the positive-part James-Stein estimator under the choice of an uninformative prior. A new approach for estimating parameters and hyperparameters in general hierarchical Bayes models is also presented.

### Citations

1160 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...(Wallace and Dowe, 2000); situations where the method of maximum likelihood (ML) fails. Note that the MML criterion is similar to the Minimum Description Length (MDL) criterion developed by Rissanen (=-=Rissanen, 1978-=-, 1996), but differs on several technical and philosophical details. Perhaps most importantly, under the MML 2principle a model is required to be fully specified in the sense that all parameters must... |

311 |
An Information Measure for Classification
- Wallace, Boulton
- 1968
(Show Context)
Citation Context ... examples with multiple hyperparameters are discussed in Section 5. Concluding remarks are given in Section 6. 2. Inference by Minimum Message Length Under the Minimum Message Length (MML) principle (=-=Wallace and Boulton, 1968-=-; Wallace, 2005), inference is performed by seeking the model that admits the briefest encoding (or most compression) of a message transmitted from an imaginary sender to an imaginary receiver. The me... |

275 | Fisher information and stochastic complexity - Rissanen - 1996 |

263 |
Estimation with quadratic loss
- James, Stein
- 1956
(Show Context)
Citation Context ...ast-squares estimator is not admissible and is in fact dominated by a large class of minimax estimators. The most well known of these dominating estimators is the positive-part James-Stein estimator (=-=James and Stein, 1961-=-): ˆµJS(x) = ( 1 − k − 2 x ′ x ) x +where (·)+ = max(0, ·). Estimators in the James-Stein class tend to shrink towards some origin (in this case zero) and hence are usually referred to as shrinkage e... |

187 |
Estimation and inference by compact coding
- Wallace, Freeman
- 1987
(Show Context)
Citation Context ...1975) criterion yields an exact solution to this minimisation problem but is computationally intractable (Farr and Wallace, 2002). The most popular alternative to SMML is Wallace and Freeman’s MML87 (=-=Wallace and Freeman, 1987-=-) approximation. Here, under suitable regularity conditions the length of the message transmitting data x using model θ is approximated by: I87(x, θ) = − log π(θ) + 1 2 log |Jθ(θ)| + k log κk + } {{ 2... |

176 | Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution - Stein - 1956 |

146 | Model Selection and the Principle of Minimum Description Length
- Hansen, Yu
- 1975
(Show Context)
Citation Context ...he data is distributed as per Nk(0k, Ik) and prepends the code with an indicator variable selecting the appropriate coding scheme. This solution inflates the codelength in both cases by (log 2) nits (=-=Hansen and Yu, 2001-=-). Remark 2: Behaviour for k < 3. For k < 3, it is straightforward to show that ∂I87(·)/∂c < 0 for 0 < c < ∞, yielding an estimate ĉ87(x) = ∞ in the limit. Under this diffuse choice of hyperparameter,... |

142 | The Minimum Description Length Principle - Grünwald - 2007 |

86 |
2005): Statistical and Inductive Inference by Minimum Message
- Wallace
(Show Context)
Citation Context ...erparameters are discussed in Section 5. Concluding remarks are given in Section 6. 2. Inference by Minimum Message Length Under the Minimum Message Length (MML) principle (Wallace and Boulton, 1968; =-=Wallace, 2005-=-), inference is performed by seeking the model that admits the briefest encoding (or most compression) of a message transmitted from an imaginary sender to an imaginary receiver. The message is transm... |

72 | Data analysis using Stein’s estimator and its generalizations - Efron, Morris - 1975 |

26 |
An Invariant Bayes Method for Point Estimation
- Wallace, Boulton
- 1975
(Show Context)
Citation Context ...ruct a codebook over the space (Θ, X ) which minimises the average cost of transmitting a dataset drawn from the marginal distribution r(x) = ∫ π(θ)p(x|θ)dθ. The Strict Minimum Message Length (SMML) (=-=Wallace and Boulton, 1975-=-) criterion yields an exact solution to this minimisation problem but is computationally intractable (Farr and Wallace, 2002). The most popular alternative to SMML is Wallace and Freeman’s MML87 (Wall... |

21 | Combining possibly related estimation problems - Efron, Morris - 1973 |

19 |
Empirical Bayes Methods
- Maritz
- 1970
(Show Context)
Citation Context .... Replacing the finite sample codelength with the asymptotic codelength leads to an inconsistent estimate. Using an inconsistent estimate of c (for example, the marginal maximum likelihood estimator (=-=Maritz, 1970-=-)), leads to a greater squared-error risk. This is analogous to the MML solution for the conventional Neyman-Scott problem (Dowe and Wallace, 1996). 6. Conclusion This paper has examined the task of e... |

13 |
Choice of hierarchical priors: admissibility in estimation of normal means
- Berger, Strawdermann
- 1996
(Show Context)
Citation Context ...of the estimate. It remains to determine a suitable prior density over c. Lacking prior knowledge of the distribution of µ, the authors opt for the traditionally uninformative uniform prior π(c) ∝ 1 (=-=Berger and Strawderman, 1996-=-) (also suggested by Stein (1962), given that c is essentially a parameter describing µ ′ µ) over some suitable support. The complete message length is now I87(x, µ, c) = 1 2 (x − µ)′ k − 2 (x − µ) + ... |

11 |
The complexity of strict minimum message length inference
- Farr, Wallace
- 2002
(Show Context)
Citation Context ...ibution r(x) = ∫ π(θ)p(x|θ)dθ. The Strict Minimum Message Length (SMML) (Wallace and Boulton, 1975) criterion yields an exact solution to this minimisation problem but is computationally intractable (=-=Farr and Wallace, 2002-=-). The most popular alternative to SMML is Wallace and Freeman’s MML87 (Wallace and Freeman, 1987) approximation. Here, under suitable regularity conditions the length of the message transmitting data... |

9 |
1997]: ‘Resolving the Neyman-Scott problem by Minimum Message Length
- Dowe, Wallace
(Show Context)
Citation Context ...ise the expected behaviour of datasets generated by the point estimate under consideration. This approach leads to consistent estimates of nuisance parameters in the case of the Neyman-Scott problem (=-=Dowe and Wallace, 1996-=-) and class labelling in mixture models (Wallace and Dowe, 2000); situations where the method of maximum likelihood (ML) fails. Note that the MML criterion is similar to the Minimum Description Length... |

8 |
Confidence sets for the mean of a multivariate normal distribution (with discussion
- Stein
- 1962
(Show Context)
Citation Context ... problem with multiple hyperparameters in the next section. 5. Shrinkage Towards a Grand Mean The following extension to the basic JS shrinkage estimator was proposed by Lindley in the discussion of (=-=Stein, 1962-=-) and has been applied to several problems by Efron and Morris (1973, 1975). Lindley suggests that instead of shrinking to the origin, one may wish to shrink the parameters to another point in the par... |

8 | Single-factor analysis by minimum message length estimation - Wallace, Freeman - 1992 |

2 | The empirical Bayes approach to statistical decision problems - unknown authors - 1964 |

1 |
Theory of Point Estimation, 4th Edition. Springer Texts in Statistics
- Lehmann, Casella
- 2003
(Show Context)
Citation Context ...of inferring the mean µ from a single observation x ∈ R k of the random variable X. It is well known, that the uniformly minimum variance unbiased (UMVU) estimator of µ is the least squares estimate (=-=Lehmann and Casella, 2003-=-) given by ˆµLS(x) = x This estimator is minimax under the squared error loss function and is equivalent to the maximum likelihood estimator. Remarkably, Stein (1956) has demonstrated that for k ≥ 3, ... |

1 |
Statistical Decision Functions, 2nd Edition. Chelsea Pub Co
- Wald
- 1971
(Show Context)
Citation Context ...riable distributed according to a multivariate Gaussian density X ∼ Nk(µ, Σ) with an unknown mean µ ∈ R k and a known variance Σ = Ik. The accuracy, or risk, of an estimator ˆµ(x) of µ is defined as (=-=Wald, 1971-=-): R(µ, ˆµ(x)) = Ex [L(µ, ˆµ(x))] where L(·) ≥ 0 is the squared error loss function: L(µ, ˆµ(x)) = ( ˆµ(x) − µ) ′ ( ˆµ(x) − µ) The task is to find an estimator ˆµ(x) which minimises the risk for all v... |