## Bayesian Compressive Sensing (2007)

### Cached

### Download Links

Citations: | 136 - 15 self |

### BibTeX

@MISC{Ji07bayesiancompressive,

author = {Shihao Ji and Ya Xue and Lawrence Carin},

title = {Bayesian Compressive Sensing },

year = {2007}

}

### OpenURL

### Abstract

The data of interest are assumed to be represented as N-dimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basis-function coefficients associated with B. Compressive sensing is a framework whereby one does not measure one of the aforementioned N-dimensional signals directly, but rather a set of related measurements, with the new measurements a linear combination of the original underlying N-dimensional signal. The number of required compressive-sensing measurements is typically much smaller than N, offering the potential to simplify the sensing system. Let f denote the unknown underlying N-dimensional signal, and g a vector of compressive-sensing measurements, then one may approximate f accurately by utilizing knowledge of the (under-determined) linear relationship between f and g, in addition to knowledge of the fact that f is compressible in B. In this paper we employ a Bayesian formalism for estimating the underlying signal f based on compressive-sensing measurements g. The proposed framework has the following properties: (i) in addition to estimating the underlying signal f, “error bars ” are also estimated, these giving a measure of confidence in the inverted signal; (ii) using knowledge of the error bars, a principled means is provided for determining when a sufficient

### Citations

8603 |
Elements of information theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...nsing may be stopped. As discussed above, the estimated posterior on the signal f is a multivariate Gaussian distribution, with mean E(f) = Bµ and covariance Cov(f) = BΣB T . The differential entropy =-=[33]-=- for f therefore satisfies: � h(f) = − p(f) log p(f) df = 1 2 log |BΣBT | + const = 1 log |Σ| + const 2 = − 1 2 log |A + α0Φ T Φ| + const, (16) where const is independent of the projection matrix Φ. R... |

1863 | Regression shrinkage and selection via the lasso - Tibshirani - 1996 |

1746 | Compressed sensing
- Donoho
- 2006
(Show Context)
Citation Context ...nal directly, such that most unnecessary measurements are avoided from the start? This question has been answered in the affirmative, with this spawning the new field of compressive sensing (CS) [5], =-=[6]-=-. When performing compressive measurements, one does not attempt to directly measure the N dominant wavelet coefficients, as this would require adapting to each new signal. Rather, in a CS measurement... |

1727 |
Ten Lectures on Wavelets
- Daubechies
(Show Context)
Citation Context ...there have been significant advances in the development of orthonormal bases for compact representation of a wide class of discrete signals. An important example of this is the wavelet transform [1], =-=[2]-=-, with which general signals are represented in terms of atomic elements localized in time and frequency, assuming that the data index represents time (it may similarly represent space). The localized... |

1675 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1998
(Show Context)
Citation Context ...CS measurements if u is highly compressible in the basis Ψ. The utility of this framework has motivated development over the last few years of several techniques for performing the CS inversion v → û =-=[7]-=-–[11]. Before proceeding, it should be emphasized that CS is a framework that is not limited to wavelet-based representations. While wavelets played a key role in early developments of sparse signal r... |

1318 | Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information
- Candès, Romberg, et al.
- 2006
(Show Context)
Citation Context ...e signal directly, such that most unnecessary measurements are avoided from the start? This question has been answered in the affirmative, with this spawning the new field of compressive sensing (CS) =-=[5]-=-, [6]. When performing compressive measurements, one does not attempt to directly measure the N dominant wavelet coefficients, as this would require adapting to each new signal. Rather, in a CS measur... |

1042 |
Bayesian Theory
- Bernardo, Smith
- 1994
(Show Context)
Citation Context ...n In a Bayesian formulation our understanding of the fact that ws is sparse is formalized by placing a sparseness-promoting prior on ws. A widely used sparseness prior is the Laplace density function =-=[12]-=-, [13]: p(w|λ) = (λ/2) N exp(−λ N� |wi|), (5) where in (5) and henceforth we drop the subscript s on w, recognizing that we are always interested in a sparse solution for the weights. Given the CS mea... |

885 | A new fast and efficient image codec based on set partitioning in hierarchical trees
- Said, Pearlman
- 1996
(Show Context)
Citation Context ...he compressive properties of wavelets assures that �u − uN� 2 2 is typically small for N ≪ m, thereby motivating the use of wavelets in a new generation of compression techniques for images and video =-=[3]-=-, [4]. While wavelets have had a profound impact on practical compression schemes, there are issues that warrant further investigation. For example, while most natural signals are highly compressible ... |

766 | Probability, random variables, and stochastic processes, 2nd ed - Papoulis - 1984 |

758 | Least angle regression
- Efron, Hastie, et al.
(Show Context)
Citation Context ...t to zero with minimal impact on the reconstruction of ui). This relationship to linear regression makes existing sparse regression algorithms particularly relevant for CS inversion, for example [12]–=-=[15]-=-. Each of the CS measurements {vi}i=1,M yields a corresponding regression “task” vi → ˆ θi, and performing multiple such learning tasks has been referred to in the machine-learning community as multi-... |

715 |
A Bayesian analysis of some nonparametric problems
- Ferguson
- 1973
(Show Context)
Citation Context ... to model the statistics of the quadtrees, and the multi-task sharing mechanisms may be implemented using more-sophisticated MTL tools than those investigated here. For example, the Dirichlet process =-=[37]-=- has proven to be a very effective tool for multi-task learning; this type of model is also within the hierarchical Bayesian family, but with far more sophistication and generality than that considere... |

556 | Sparse Bayesian learning and the relevance vector machine
- Tipping
- 2001
(Show Context)
Citation Context ...be set to zero with minimal impact on the reconstruction of ui). This relationship to linear regression makes existing sparse regression algorithms particularly relevant for CS inversion, for example =-=[12]-=-–[15]. Each of the CS measurements {vi}i=1,M yields a corresponding regression “task” vi → ˆ θi, and performing multiple such learning tasks has been referred to in the machine-learning community as m... |

523 | DJC: Bayesian interpolation - MacKay - 1992 |

472 | Multitask learning
- Caruana
- 1997
(Show Context)
Citation Context ...CS measurements {vi}i=1,M yields a corresponding regression “task” vi → ˆ θi, and performing multiple such learning tasks has been referred to in the machine-learning community as multi-task learning =-=[16]-=-; ûi satisfies ûi = Ψ ˆ θi. Typical approaches to information transfer among tasks include: sharing hidden nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models [1... |

426 | The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35:2313–23516
- Candès, Tao
- 2007
(Show Context)
Citation Context ...sparsity measure than the ℓ1-norm, and prove that even in the worst-case scenario, the RVM still outperforms the most widely used sparse representation algorithms, including BP 2 While previous works =-=[23]-=-, [24] in CS do obtain ℓ2 error bounds for function estimates, the “error bars” may be more useful from a practical standpoint as discussed follows. 3 A simple modification to (10) is available from [... |

327 | Information-based objective functions for active data selection - MacKay - 1992 |

325 | Wavelet-based statistical signal processing using hidden Markov models
- Crouse, Nowak, et al.
- 1998
(Show Context)
Citation Context ...cs, although the location of the similar quadtrees are shifted within the image, commensurate with the associated object shift in the original image. Statistical models such as the hidden Markov tree =-=[36]-=- may be used to model the statistics of the quadtrees, and the multi-task sharing mechanisms may be implemented using more-sophisticated MTL tools than those investigated here. For example, the Dirich... |

322 | A framework for learning predictive structures from multiple tasks and unlabeled data
- Ando, Zhang
- 2005
(Show Context)
Citation Context ... nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models [19]–[22], sharing parameters of Gaussian processes [23], sharing a common structure on the predictor space =-=[24]-=-, and structured regularization in kernel methods [25], among others. In statistics, the problem of combining information from similar but independent experiments has been studied in the field of meta... |

297 | Signal recovery from random measurements via orthogonal matching pursuit - TROPP, GILBERT - 2007 |

294 | Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other
- Figueiredo, Nowak, et al.
- 2007
(Show Context)
Citation Context ...easurements if u is highly compressible in the basis Ψ. The utility of this framework has motivated development over the last few years of several techniques for performing the CS inversion v → û [7]–=-=[11]-=-. Before proceeding, it should be emphasized that CS is a framework that is not limited to wavelet-based representations. While wavelets played a key role in early developments of sparse signal repres... |

283 |
Bayes and Empirical Bayes Methods for Data Analysis
- Carlin, Louis
- 1996
(Show Context)
Citation Context ...uggests an iterative algorithm that alternates between these global and local solutions, as outlined next. This framework is related to extensive research in statistics on empirical Bayesian analysis =-=[32]-=-. Specifically, all of the data {vi}i=1,M are used to constitute point estimates for the parameters α and α0. Using the point estimate for α, one may specify the prior on the weights θi, via (3). Usin... |

215 | The Relevance Vector Machine
- Tipping
- 2000
(Show Context)
Citation Context ...osterior density function for w and α0. For example, one may conveniently implement a Markov Chain Monte Carlo (MCMC) [17] or, more efficiently and approximately, a variational Bayesian (VB) analysis =-=[18]-=-. While the VB analysis is efficient relative to MCMC, in the RVM a type-II maximum-likelihood (ML) procedure is considered, with the objective of achieving highly efficient computations while still p... |

211 | Algorithms for simultaneous sparse approximation - Tropp, Gilbert, et al. - 2006 |

174 | Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit
- Donoho, Tsaig, et al.
- 2007
(Show Context)
Citation Context ...i.e., measures of uncertainty, adaptive design of projection, etc. [10]), (hierarchical) Bayesian analysis also provides a flexible framework for multi-task CS. Conventional CS inverse algorithms [7]–=-=[9]-=- typically employ a point estimate for θi, and therefore are not directly amenable for information transfer among related multiple CS tasks. In addition to developing a multi-task CS framework, a modi... |

170 | Signal reconstruction from noisy random projections
- Haupt, Nowak
- 1998
(Show Context)
Citation Context ...ty measure than the ℓ1-norm, and prove that even in the worst-case scenario, the RVM still outperforms the most widely used sparse representation algorithms, including BP 2 While previous works [23], =-=[24]-=- in CS do obtain ℓ2 error bounds for function estimates, the “error bars” may be more useful from a practical standpoint as discussed follows. 3 A simple modification to (10) is available from [25] by... |

158 | Learning multiple tasks with kernel methods
- Evgeniou, Micchelli, et al.
(Show Context)
Citation Context ...prior in hierarchical Bayesian models [19]–[22], sharing parameters of Gaussian processes [23], sharing a common structure on the predictor space [24], and structured regularization in kernel methods =-=[25]-=-, among others. In statistics, the problem of combining information from similar but independent experiments has been studied in the field of meta-analysis [26] for a variety of applications in medici... |

147 | Signal recovery from partial information via orthogonal matching pursuit - Tropp, Gilbert |

132 | Sparse Solutions to Linear Inverse Problems with Multiple Measurement Vectors - Cotter, Rao, et al. - 2005 |

126 | Approaches for Bayesian variable selection
- George, McCulloch
- 1997
(Show Context)
Citation Context ...y efficient computations while still preserving accurate results. As one may note, the Bayesian linear model considered in RVM is essentially one of the simplified models for Bayesian model selection =-=[19]-=-–[21]. Although more accurate models may be desired, the main motivation of adopting the RVM is due to its highly efficient computation as discussed below. B. Bayesian CS Inversion via RVM Assume the ... |

117 | Calibration and empirical Bayes Variable Selection - George, Foster - 2000 |

97 |
a Wavelet Tour of Signal Processing 2 nd Ed
- Mallat
- 1999
(Show Context)
Citation Context ...e sensing (CS), Multi-task learning, Sparse Bayesian learning, Hierarchical Bayesian modeling, Modified relevance vector machine June 15, 2007 DRAFT 1sI. INTRODUCTION The development of wavelets [1], =-=[2]-=- has had a significant impact on several areas of signal processing and compression. An important characteristic of wavelets is the sparse representation of most natural signals in terms of a wavelet ... |

97 | Learning Gaussian processes from multiple tasks - Yu, Tresp, et al. - 2005 |

90 | Learning internal representations - Baxter - 1995 |

84 |
secondary, and meta-analysis of research
- Glass, “Primary
- 1976
(Show Context)
Citation Context ...ctured regularization in kernel methods [25], among others. In statistics, the problem of combining information from similar but independent experiments has been studied in the field of meta-analysis =-=[26]-=- for a variety of applications in medicine, psychology and education. Hierarchical Bayesian modeling is one of the most important methods for meta analysis [27]–[31]. Hierarchical Bayesian models prov... |

76 |
The Bayesian Choice: From Decision-Theoretic Motivations to Computational Implementation
- Robert
- 2001
(Show Context)
Citation Context ...icient computations while still preserving accurate results. As one may note, the Bayesian linear model considered in RVM is essentially one of the simplified models for Bayesian model selection [19]–=-=[21]-=-. Although more accurate models may be desired, the main motivation of adopting the RVM is due to its highly efficient computation as discussed below. B. Bayesian CS Inversion via RVM Assume the hyper... |

76 | Sparse Bayesian learning for basis selection
- Wipf, Rao
- 2004
(Show Context)
Citation Context ...], [24] in CS do obtain ℓ2 error bounds for function estimates, the “error bars” may be more useful from a practical standpoint as discussed follows. 3 A simple modification to (10) is available from =-=[25]-=- by exploiting the matrix inverse identity, which leads to an O(K 3 ) operation per iteration. Nonetheless, the iterative (EM) implementation still does not scale well. July 15, 2007 DRAFT 8s[8] and O... |

66 | Fast marginal likelihood maximisation for sparse bayesian models
- Tipping, Faul
(Show Context)
Citation Context ...owed by algebra, yields α new j α new 0 where µi,j is the jth component of µi. = = M � M i=1 µ2 i,j � M i=1 ni � M i=1 �vi − Φiµi� , j ∈ {1, 2, . . . , m}, (15) 2 , (16) 2) Fast Algorithm: Similar to =-=[33]-=-, considering the dependence of L(α, α0) on a single hyperparameter αj, j ∈ {1, 2, . . . , m}, we can decompose Ci in (14) as Ci = α −1 0 I + � k�=j α −1 k Φi,kΦ T i,k + α−1 j Φi,jΦ T i,j = Ci,−j + α ... |

65 |
Bayesian Data Analysis 2nd ed
- Gelman, Carlin
- 2004
(Show Context)
Citation Context ...ccomplished using the Laplace prior directly, since the Laplace prior is not conjugate to the Gaussian likelihood and hence the associated Bayesian inference may not be performed in closed form [12], =-=[15]-=-. This issue has been addressed previously in sparse Bayesian learning, particularly, with the relevance vector machine (RVM) [16]. Rather than imposing a Laplace prior on w, in the RVM a hierarchical... |

51 | Efficient, lowcomplexity image coding with a set-partitioning embedded block coder
- Pearlman, Islam, et al.
(Show Context)
Citation Context ...mpressive properties of wavelets assures that �u − uN� 2 2 is typically small for N ≪ m, thereby motivating the use of wavelets in a new generation of compression techniques for images and video [3], =-=[4]-=-. While wavelets have had a profound impact on practical compression schemes, there are issues that warrant further investigation. For example, while most natural signals are highly compressible in a ... |

47 | Learning to learn with the informative vector machine
- Lawrence, Platt
- 2004
(Show Context)
Citation Context ...information transfer among tasks include: sharing hidden nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models [19]–[22], sharing parameters of Gaussian processes =-=[23]-=-, sharing a common structure on the predictor space [24], and structured regularization in kernel methods [25], among others. In statistics, the problem of combining information from similar but indep... |

45 | Learning multiple related tasks using latent independent component analysis
- Zhang, Ghahramani, et al.
(Show Context)
Citation Context ...i satisfies ûi = Ψ ˆ θi. Typical approaches to information transfer among tasks include: sharing hidden nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models [19]–=-=[22]-=-, sharing parameters of Gaussian processes [23], sharing a common structure on the predictor space [24], and structured regularization in kernel methods [25], among others. In statistics, the problem ... |

41 | Analysis of sparse bayesian learning
- Faul, Tipping
(Show Context)
Citation Context ...more robust to the parameter setting than the original RVM formulation in Sec. II. As we will see soon, this modified formalism and the fast inference algorithm can be derived in a manner parallel to =-=[34]-=-. We define a zero-mean Gaussian prior for each component of θi, and define a Gamma prior on the noise precision α0: p(θi|α, α0) = M� N (θi,j | 0, α j=1 −1 0 α−1 j 11 ), (26) p(α0|a, b) = Ga(α0|a, b).... |

40 | An empirical Bayesian strategy for solving the simultaneous sparse approximation problem - Wipf, Rao - 2007 |

39 | A New View of Automatic Relevance Determination - Wipf, Nagarajan |

38 | Adaptive sparseness using Jeffreys prior - Figueiredo |

31 | A Method for Combining Inference Across Related Nonparametric Bayesian Models - Muller, Quintana, et al. - 2004 |

25 | Simultaneous sparse approximation via greedy pursuit - Tropp, Gilbert, et al. - 2005 |

24 | Perspectives on sparse Bayesian learning
- Wipf, Palmer, et al.
- 2006
(Show Context)
Citation Context ...nly a relatively small set of wi, for which the corresponding αi remains relatively small, contribute for representation of g, and the level of sparseness (size of M) is determined automatically (see =-=[22]-=- for an interesting explanation from a variational approximation perspective). It is also important to note that, as a result of the type-II ML estimate (11), the point estimates (rather than the post... |

24 |
Fedorov, Theory of Optimal Experiments
- V
- 1971
(Show Context)
Citation Context ...lecting projection rK+1, with the goal of reducing uncertainty. Such a framework has been previously studied in the machine learning community under the name of experimental design or active learning =-=[30]-=-–[32]. Further, the error bars also give a way to determine how many measurements are enough for faithful CS reconstruction, i.e., when the change in the uncertainty is not significant, it may be assu... |

22 | A nonparametric hierarchical bayesian framework for information filter - YU, TRESP, et al. - 2004 |