Analysis Of Multiresolution Image Denoising Schemes Using GeneralizedGaussian Priors
 IEEE TRANS. INFO. THEORY
, 1998
"... In this paper, we investigate various connections between wavelet shrinkage methods in image processing and Bayesian estimation using Generalized Gaussian priors. We present fundamental properties of the shrinkage rules implied by Generalized Gaussian and other heavytailed priors. This allows us to ..."
In this paper, we investigate various connections between wavelet shrinkage methods in image processing and Bayesian estimation using Generalized Gaussian priors. We present fundamental properties of the shrinkage rules implied by Generalized Gaussian and other heavytailed priors. This allows us to show a simple relationship between differentiability of the logprior at zero and the sparsity of the estimates, as well as an equivalence between universal thresholding schemes and Bayesian estimation using a certain Generalized Gaussian prior.
The ContextTree Weighting Method: Basic Properties
 IEEE Trans. Inform. Theory
, 1995
"... We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture." Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding ..."
We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture." Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. We derive a natural upper bound on the cumulative redundancy of our method for individual sequences. The three terms in this bound can be identified as coding, parameter, and model redundancy. The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. Our upper bound on the redundancy shows that the proposed contexttree weighting procedure is optimal in the sense that it achieves the Rissanen (1984) lower bound.
Universal prediction of individual sequences
 IEEE Transactions on Information Theory
, 1992
"... AbstructThe problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finitestate predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finitestate (FS) predictor. It is proved t ..."
AbstructThe problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finitestate predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finitestate (FS) predictor. It is proved that this FS predictability can be attained by universal sequential prediction schemes. Specifically, an efficient prediction procedure based on the incremental parsing procedure of the LempelZiv data compression algorithm is shown to achieve asymptotically the FS predictability. Finally, some relations between compressibility and predictability are pointed out, and the predictability is proposed as an additional measure of the complexity of a sequence. Index TermsPredictability, compressibility, complexity, finitestate machines, Lempel Ziv algorithm.
Universal prediction
 IEEE Transactions on Information Theory
, 1998
"... Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both th ..."
Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings. Index Terms — Bayes envelope, entropy, finitestate machine, linear prediction, loss function, probability assignment, redundancycapacity, stochastic complexity, universal coding, universal prediction. I.
Contextbased adaptive binary arithmetic coding in the h.264/avc video compression standard. Circuits and Systems for VideoTechnology, IEEETransactions on
"... (CABAC) as a normative part of the new ITUT/ISO/IEC standard H.264/AVC for video compression is presented. By combining an adaptive binary arithmetic coding technique with context modeling, a high degree of adaptation and redundancy reduction is achieved. The CABAC framework also includes a novel l ..."
(CABAC) as a normative part of the new ITUT/ISO/IEC standard H.264/AVC for video compression is presented. By combining an adaptive binary arithmetic coding technique with context modeling, a high degree of adaptation and redundancy reduction is achieved. The CABAC framework also includes a novel lowcomplexity method for binary arithmetic coding and probability estimation that is well suited for efficient hardware and software implementations. CABAC significantly outperforms the baseline entropy coding method of H.264/AVC for the typical area of envisaged target applications. For a set of test sequences representing typical material used in broadcast applications and for a range of acceptable video quality of about 30 to 38 dB, average bitrate savings of 9%–14 % are achieved. Index Terms—Binary arithmetic coding, CABAC, context modeling, entropy coding, H.264, MPEG4 AVC. I.
Informationtheoretic asymptotics of Bayes methods
 IEEE Transactions on Information Theory
, 1990
"... AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian densit ..."
AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian density and show that the asymptotic distance is (d/2Xlogn)+ c, where d is the dimension of the parameter vector. Therefore, the relative entropy rate D,,/n converges to zero at rate (logn)/n. The constant c, which we explicitly identify, depends only on the prior density function and the Fisher information matrix evaluated at the true parameter value. Consequences are given for density estimation, universal data compression, composite hypothesis testing, and stockmarket portfolio selection. 1.
Generalizing Case Frames Using a Thesaurus and the MDL Principle
 Computational Linguistics
, 1998
"... this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue ..."
this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue
InformationTheoretic Determination of Minimax Rates of Convergence
 Ann. Stat
, 1997
"... In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence. ..."
In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
Identification of humans using gait
 IEEE Transactions on Image Processing
, 2004
"... Abstract—We propose a viewbased approach to recognize humans from their gait. Two different image features have been considered: the width of the outer contour of the binarized silhouette of the walking person and the entire binary silhouette itself. To obtain the observation vector from the image ..."
Abstract—We propose a viewbased approach to recognize humans from their gait. Two different image features have been considered: the width of the outer contour of the binarized silhouette of the walking person and the entire binary silhouette itself. To obtain the observation vector from the image features, we employ two different methods. In the first method, referred to as the indirect approach, the highdimensional image feature is transformed to a lower dimensional space by generating what we call the frame to exemplar (FED) distance. The FED vector captures both structural and dynamic traits of each individual. For compact and effective gait representation and recognition, the gait information in the FED vector sequences is captured in a hidden Markov model (HMM). In the second method, referred to as the direct approach, we work with the feature vector directly (as opposed to computing the FED) and train an HMM. We estimate the HMM parameters (specifically the observation probability) based on the distance between the exemplars and the image features. In this way, we avoid learning highdimensional probability density functions. The statistical nature of the HMM lends overall robustness to representation and recognition. The performance of the methods is illustrated using several databases. I.
Balancing accuracy and parsimony in genetic programming
 EVOLUTIONARY COMPUTATION
, 1995
"... Genetic programming is distinguished from other evolutionary algorithms in that it uses tree representations of variable size instead of linear strings of fixed length. The flexible representation scheme is very important because it allows the underlying structure of the data to be discovered automa ..."
Genetic programming is distinguished from other evolutionary algorithms in that it uses tree representations of variable size instead of linear strings of fixed length. The flexible representation scheme is very important because it allows the underlying structure of the data to be discovered automatically. One primary difficulty, however, is that the solutions may grow too bigwithout any improvement oftheir generalization ability. In this article we investigate the fundamental relationship between the performance and complexity of the evolved structures. The essence of the parsimony problem is demonstrated empirically by analyzing error landscapes of programs evolved for neural network synthesis. We consider genetic programming as a statistical inference problem and apply the Bayesian modelcomparison framework to introduce a class of fitness functions with error and complexity terms. An adaptive learning method is then presented that automatically balances the modelcomplexity factor to evolve parsimonious programs without losing the diversity of the population needed for achieving the desired training accuracy. The effectiveness of this approach is empirically shown on the induction of sigmapi neural networks for solving a realworld medical diagnosis problem as well as benchmark tasks.