Image Quality Assessment: From Error Visibility to Structural Similarity
 IEEE TRANSACTIONS ON IMAGE PROCESSING
, 2004
Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and stateoftheart objective methods on a database of images compressed with JPEG and JPEG2000.
Learning lowlevel vision
 International Journal of Computer Vision
, 2000
We show a learningbased method for lowlevel vision problems. We setup a Markov network of patches of the image and the underlying scene. A factorization approximation allows us to easily learn the parameters of the Markov network from synthetic examples of image/scene pairs, and to e ciently propagate image information. Monte Carlo simulations justify this approximation. We apply this to the \superresolution &quot; problem (estimating high frequency details from a lowresolution image), showing good results. For the motion estimation problem, we show resolution of the aperture problem and llingin arising from application of the same probabilistic machinery.
Image denoising using a scale mixture of Gaussians in the wavelet domain
 IEEE TRANS IMAGE PROCESSING
, 2003
We describe a method for removing noise from digital images, based on a statistical model of the coefficients of an overcomplete multiscale oriented basis. Neighborhoods of coefficients at adjacent positions and scales are modeled as the product of two independent random variables: a Gaussian vector and a hidden positive scalar multiplier. The latter modulates the local variance of the coefficients in the neighborhood, and is thus able to account for the empirically observed correlation between the coefficient amplitudes. Under this model, the Bayesian least squares estimate of each coefficient reduces to a weighted average of the local linear estimates over all possible values of the hidden multiplier variable. We demonstrate through simulations with images contaminated by additive white Gaussian noise that the performance of this method substantially surpasses that of previously published methods, both visually and in terms of mean squared error.
A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients
 INTERNATIONAL JOURNAL OF COMPUTER VISION
, 2000
We present a universal statistical model for texture images in the context of an overcomplete complex wavelet transform. The model is parameterized by a set of statistics computed on pairs of coefficients corresponding to basis functions at adjacent spatial locations, orientations, and scales. We develop an efficient algorithm for synthesizing random images subject to these constraints, by iteratively projecting onto the set of images satisfying each constraint, and we use this to test the perceptual validity of the model. In particular, we demonstrate the necessity of subgroups of the parameter set by showing examples of texture synthesis that fail when those parameters are removed from the set. We also demonstrate the power of our model by successfully synthesizing examples drawn from a diverse collection of artificial and natural textures.
Examplebased superresolution
 IEEE COMPUT. GRAPH. APPL
, 2001
The Problem: Pixel representations for images do not have resolution independence. When we zoom into a bitmapped image, we get a blurred image. Figure 1 shows the problem for a teapot image, rich with realworld detail. We know the teapot’s features should remain sharp as we zoom in on them, yet standard pixel interpolation methods, such as pixel replication (b, c) and cubic spline interpolation (d, e), introduce artifacts or blurring of edges. For images zoomed 3 octaves, such as these, sharpening the interpolated result has little useful effect (f, g). Many applications in graphics or image processing could benefit from such pixel resolution independence, such as texture mapping, enlarging consumer photographs, and converting NTSC video content to HDTV. We don’t expect perfect resolution independence—even the polygon representation doesn’t have that—but increasing the resolution independence of pixelbased representations is an important task for imagebased rendering. Our examplebased superresolution algorithm yields Fig. 1 (h, i). Previous Work: Researchers have long studied image interpolation, although only recently using machine learning or sampling approaches, which offer much power. Cubic spline interpolation [5] is a very common image interpolation function, but suffers from blurring of edges and image details. Recent attempts to improve on cubic spline interpolation [6, 8, 2] have met with limited success. Schreiber and collaborators [6] proposed a sharpened Gaussian interpolator function to minimize information
Face recognition by independent component analysis
 IEEE Transactions on Neural Networks
, 2002
Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwise relationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the highorder relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these highorder statistics. Independent component analysis (ICA), a generalization of PCA, is one such method. We used a version of ICA derived from the principle of optimal information transfer through sigmoidal neurons. ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code. Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance. Index Terms—Eigenfaces, face recognition, independent component analysis (ICA), principal component analysis (PCA), unsupervised learning. I.
Classifying Facial Actions
 IEEE Trans. Pattern Anal and Machine Intell
, 1999
AbstractÐThe Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions.
Deriving Intrinsic Images from Image Sequences
, 2001
Intrinsic images are a useful midlevel description of scenes proposed by Barrow and Tenebaum [1]. An image is decomposed into two images: a reflectance image and an illumination image. Finding such a decomposition remains a difficult problem in computer vision. Here we focus on a slightly easier problem: given a sequence of T images where the reflectance is constant and the illumination changes, can we recover T illumination images and a single reflectance image? We show that this problem is still illposed and suggest approaching it as a maximumlikelihood estimation problem. Following recent work on the statistics of natural images, we use a prior that assumes that illumination images will give rise to sparse filter outputs. We show that this leads to a simple, novel algorithm for recovering reflectance images. We illustrate the algorithm's performance on real and synthetic image sequences.
Image compression via joint statistical characterization in the wavelet domain
, 1997
We develop a statistical characterization of natural images in the wavelet transform domain. This characterization describes the joint statistics between pairs of subband coefficients at adjacent spatial locations, orientations, and scales. We observe that the raw coefficients are nearly decorrelated, but their magnitudes are highly correlated. A linear magnitude predictor coupled with both multiplicative and additive uncertainties accounts for the joint coefficient statistics of a wide variety of images including photographic images, graphical images, and medical images. In order to directly demonstrate the power of this model, we construct an image coder called EPWIC (Embedded Predictive Wavelet Image Coder), in which subband coefficients are encoded one bitplane at a time using a nonadaptive arithmetic encoder that utilizes probabilities calculated from the model. Bitplanes are ordered using a greedy algorithm that considers the MSE reduction per encoded bit. The decoder uses the statistical model to predict coefficient values based on the bits it has received. The ratedistortion performance of the coder compares favorably with the current best image coders in the literature. 1
Bivariate Shrinkage Functions for WaveletBased Denoising Exploiting Interscale Dependency
, 2002
Most simple nonlinear thresholding rules for waveletbased denoising assume that the wavelet coefficients are independent. However, wavelet coefficients of natural images have significant dependencies. In this paper, we will only consider the dependencies between the coefficients and their parents in detail. For this purpose, new nonGaussian bivariate distributions are proposed, and corresponding nonlinear threshold functions (shrinkage functions) are derived from the models using Bayesian estimation theory. The new shrinkage functions do not assume the independence of wavelet coefficients. We will show three image denoising examples in order to show the performance of these new bivariate shrinkage rules. In the second example, a simple subbanddependent datadriven image denoising system is described and compared with effective datadriven techniques in the literature, namely VisuShrink, SureShrink, BayesShrink, and hidden Markov models. In the third example, the same idea is applied to the dualtree complex wavelet coefficients.