Results 1  10
of
11
The minimum description length principle in coding and modeling
 IEEE Trans. Inform. Theory
, 1998
"... Abstract — We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized m ..."
Abstract

Cited by 305 (12 self)
 Add to MetaCart
Abstract — We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples. Index Terms—Complexity, compression, estimation, inference, universal modeling.
Universal prediction
 IEEE Transactions on Information Theory
, 1998
"... Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both th ..."
Abstract

Cited by 136 (11 self)
 Add to MetaCart
Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings. Index Terms — Bayes envelope, entropy, finitestate machine, linear prediction, loss function, probability assignment, redundancycapacity, stochastic complexity, universal coding, universal prediction. I.
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Unsupervised Contour Representation and Estimation Using BSplines and a Minimum Description Length Criterion
 IEEE Trans. on Image Processing
, 2000
"... This paper describes a new approach to adaptive estimation of parametric deformable contours based on Bspline representations. The problem is formulated in a statistical framework with the likelihood function being derived from a region based image model. The parameters of the image model, the con ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
This paper describes a new approach to adaptive estimation of parametric deformable contours based on Bspline representations. The problem is formulated in a statistical framework with the likelihood function being derived from a region based image model. The parameters of the image model, the contour parameters, and the Bspline parameterization order (i.e., the number of control points) are all considered unknown. The parameterization order is estimated via a minimum description length (MDL) type criterion. A deterministic iterative algorithm is developed to implement the derived contour estimation criterion. The result is an unsupervised parametric deformable contour: it adapts its degree of smoothness/complexity (number of control points) and it also estimates the observation (image) model parameters. The experiments reported in the paper, performed on synthetic and real (medical) images, confirm the adequacy and good performance of the approach.
Unsupervised Image Restoration and Edge Location Using Compound GaussMarkov Random Fields and the MDL Principle
 IEEE Trans. Image Processing
, 1997
"... Discontinuitypreserving Bayesian image restoration typically involves two Markov random fields: one representing the image intensities/gray levels to be recovered and another one signaling discontinuities/edges to be preserved. The usual strategy is to perform joint maximum a posteriori (MAP) estim ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Discontinuitypreserving Bayesian image restoration typically involves two Markov random fields: one representing the image intensities/gray levels to be recovered and another one signaling discontinuities/edges to be preserved. The usual strategy is to perform joint maximum a posteriori (MAP) estimation of the image and its edges, which requires the specification of priors for both fields. In this paper, instead of taking an edge prior, we interpret discontinuities (in fact their locations) as deterministic unknown parameters of the compound GaussMarkov random field (CGMRF), which is assumed to model the intensities. This strategy should allow inferring the discontinuity locations directly from the image with no further assumptions. However, an additional problem emerges: The number of parameters (edges) is unknown. To deal with it, we invoke the minimum description length (MDL) principle; according to MDL, the best edge configuration is the one that allows the shortest description of the image and its edges. Taking the other model parameters (noise and CGMRF variances) also as unknown, we propose a new unsupervised discontinuitypreserving image restoration criterion. Implementation is carried out by a continuationtype iterative algorithm which provides estimates of the number of discontinuities, their locations, the noise variance, the original image variance, and the original image itself (restored image). Experimental results with real and synthetic images are reported.
Model selection for geometric inference
 Proc. 5th Asian Conf. Comput. Vision
, 2002
"... Contrasting “geometric fitting”, for which the noise level is taken as the asymptotic variable, with “statistical inference”, for which the number of observations is taken as the asymptotic variable, we give a new definition of the “geometric AIC ” and the “geometric MDL ” as the counterparts of Aka ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Contrasting “geometric fitting”, for which the noise level is taken as the asymptotic variable, with “statistical inference”, for which the number of observations is taken as the asymptotic variable, we give a new definition of the “geometric AIC ” and the “geometric MDL ” as the counterparts of Akaike’s AIC and Rissanen’s MDL. We discuss various theoretical and practical problems that emerge from our analysis. Finally, we experimentally show that the geometric AIC and the geometric MDL have very different characteristics. 1.
MDL denoising revisited
 IEEE Transactions on Signal Processing, 57(9):3347 – 3360
, 2009
"... Abstract — We refine and extend an earlier MDL denoising criterion for waveletbased denoising. We start by showing that the denoising problem can be reformulated as a clustering problem, where the goal is to obtain separate clusters for informative and noninformative wavelet coefficients, respecti ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract — We refine and extend an earlier MDL denoising criterion for waveletbased denoising. We start by showing that the denoising problem can be reformulated as a clustering problem, where the goal is to obtain separate clusters for informative and noninformative wavelet coefficients, respectively. This suggests two refinements, adding a codelength for the model index, and extending the model in order to account for subbanddependent coefficient distributions. A third refinement is derivation of soft thresholding inspired by predictive universal coding with weighted mixtures. We propose a practical method incorporating all three refinements, which is shown to achieve good performance and robustness in denoising both artificial and natural signals. Index Terms — Minimum description length (MDL) principle, wavelets, denoising. I.
Joint fixedrate universal lossy coding and identification of continuousalphabet memoryless sources
 IEEE Trans. Inform. Theory
"... The problem of joint universal source coding and identification is considered in the setting of fixedrate lossy coding of continuousalphabet memoryless sources. For a wide class of bounded distortion measures, it is shown that any compactly parametrized family of R dvalued i.i.d. sources with abs ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The problem of joint universal source coding and identification is considered in the setting of fixedrate lossy coding of continuousalphabet memoryless sources. For a wide class of bounded distortion measures, it is shown that any compactly parametrized family of R dvalued i.i.d. sources with absolutely continuous distributions satisfying appropriate smoothness and Vapnik–Chervonenkis learnability conditions, admits a joint scheme for universal lossy block coding and parameter estimation, such that when the block length n tends to infinity, the overhead perletter rate and the distortion redundancies converge to zero as O(n −1 log n) and O ( � n −1 log n), respectively. Moreover, the active source can be determined at the decoder up to a ball of radius O ( � n −1 log n) in variational distance, asymptotically almost surely. The system has finite memory length equal to the block length, and can be thought of as blockwise application of a timeinvariant nonlinear filter with initial conditions determined from the previous block. Comparisons are presented with several existing schemes for universal vector quantization, which do not include parameter estimation explicitly, and an extension to unbounded distortion measures is outlined. Finally, finite mixture classes and exponential families are given as explicit examples of parametric sources admitting joint universal compression and modeling schemes of the kind studied here. Keywords: Learning, minimumdistance density estimation, twostage codes, universal vector quantization, Vapnik– Chervonenkis dimension. I.
Complexity of simple nonlogarithmic loss functions
 IEEE Transactions on Information Theory
, 2003
"... Abstract—The loss complexity for nonlogarithmic loss functions is defined analogously to the stochastic complexity for logarithmic loss functions such that its mean provides an achievable lower bound for estimation, the mean taken with respect to the worst case data generating distribution. The loss ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract—The loss complexity for nonlogarithmic loss functions is defined analogously to the stochastic complexity for logarithmic loss functions such that its mean provides an achievable lower bound for estimation, the mean taken with respect to the worst case data generating distribution. The loss complexity also provides a lower bound for the worst case mean prediction error for all predictors. For the importantloss functions ^, where ^ denotes the prediction or fitting error and is in the interval [1 2], an accurate asymptotic formula for the loss complexity is given. Index Terms —loss functions, complexity, maximum entropy, minmax bounds, prediction bound. I.
Joint universal lossy coding and identification of stationary mixing sources with general alphabets
 IEEE Trans. Inform. Theory
"... We consider the problem of joint universal variablerate lossy coding and identification for parametric classes of stationary βmixing sources with general (Polish) alphabets. Compression performance is measured in terms of Lagrangians, while identification performance is measured by the variational ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We consider the problem of joint universal variablerate lossy coding and identification for parametric classes of stationary βmixing sources with general (Polish) alphabets. Compression performance is measured in terms of Lagrangians, while identification performance is measured by the variational distance between the true source and the estimated source. Provided that the sources are mixing at a sufficiently fast rate and satisfy certain smoothness and Vapnik–Chervonenkis learnability conditions, it is shown that, for bounded metric distortions, there exist universal schemes for joint lossy compression and identification whose Lagrangian redundancies converge to zero as � Vn log n/n as the block length n tends to infinity, where Vn is the Vapnik–Chervonenkis dimension of a certain class of decision regions defined by the ndimensional marginal distributions of the sources; furthermore, for each n, the decoder can identify ndimensional marginal the active source up to a ball of radius O ( � Vn log n/n) in variational distance, eventually with probability one. The results are supplemented by several examples of parametric sources satisfying the regularity conditions. Keywords: Learning, minimumdistance density estimation, twostage codes, universal vector quantization, Vapnik– Chervonenkis dimension. I.