#### DMCA

## Multi-Task Compressive Sensing (2008)

### Cached

### Download Links

Citations: | 323 - 24 self |

### Citations

12168 |
Elements of information theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...nsing may be stopped. As discussed above, the estimated posterior on the signal f is a multivariate Gaussian distribution, with mean E(f) = Bµ and covariance Cov(f) = BΣB T . The differential entropy =-=[33]-=- for f therefore satisfies: � h(f) = − p(f) log p(f) df = 1 2 log |BΣBT | + const = 1 log |Σ| + const 2 = − 1 2 log |A + α0Φ T Φ| + const, (16) where const is independent of the projection matrix Φ. R... |

3998 | Regression shrinkage and selection via the lasso,” - Tibshirani - 1996 |

3543 | Compressed sensing
- Donoho
- 2006
(Show Context)
Citation Context ...nal directly, such that most unnecessary measurements are avoided from the start? This question has been answered in the affirmative, with this spawning the new field of compressive sensing (CS) [5], =-=[6]-=-. When performing compressive measurements, one does not attempt to directly measure the N dominant wavelet coefficients, as this would require adapting to each new signal. Rather, in a CS measurement... |

2683 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1998
(Show Context)
Citation Context ...CS measurements if u is highly compressible in the basis Ψ. The utility of this framework has motivated development over the last few years of several techniques for performing the CS inversion v → û =-=[7]-=-–[11]. Before proceeding, it should be emphasized that CS is a framework that is not limited to wavelet-based representations. While wavelets played a key role in early developments of sparse signal r... |

2559 | Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information
- Candès, Romberg, et al.
- 2006
(Show Context)
Citation Context ...e signal directly, such that most unnecessary measurements are avoided from the start? This question has been answered in the affirmative, with this spawning the new field of compressive sensing (CS) =-=[5]-=-, [6]. When performing compressive measurements, one does not attempt to directly measure the N dominant wavelet coefficients, as this would require adapting to each new signal. Rather, in a CS measur... |

2469 |
Ten lectures on wavelets
- Daubechies
- 1992
(Show Context)
Citation Context ...there have been significant advances in the development of orthonormal bases for compact representation of a wide class of discrete signals. An important example of this is the wavelet transform [1], =-=[2]-=-, with which general signals are represented in terms of atomic elements localized in time and frequency, assuming that the data index represents time (it may similarly represent space). The localized... |

1479 | Bayesian Theory
- Bernardo, Smith
- 1994
(Show Context)
Citation Context ...n In a Bayesian formulation our understanding of the fact that ws is sparse is formalized by placing a sparseness-promoting prior on ws. A widely used sparseness prior is the Laplace density function =-=[12]-=-, [13]: p(w|λ) = (λ/2) N exp(−λ N� |wi|), (5) where in (5) and henceforth we drop the subscript s on w, recognizing that we are always interested in a sparse solution for the weights. Given the CS mea... |

1292 | Least angle regression
- Efron, Hastie, et al.
- 2004
(Show Context)
Citation Context ...t to zero with minimal impact on the reconstruction of ui). This relationship to linear regression makes existing sparse regression algorithms particularly relevant for CS inversion, for example [12]–=-=[15]-=-. Each of the CS measurements {vi}i=1,M yields a corresponding regression “task” vi → ˆ θi, and performing multiple such learning tasks has been referred to in the machine-learning community as multi-... |

1204 | Probability, random variables, and stochastic processes (4th ed - Papoulis, Pillai - 2002 |

1179 |
A bayesian analysis of some nonparametric problems
- Ferguson
- 1973
(Show Context)
Citation Context ... to model the statistics of the quadtrees, and the multi-task sharing mechanisms may be implemented using more-sophisticated MTL tools than those investigated here. For example, the Dirichlet process =-=[37]-=- has proven to be a very effective tool for multi-task learning; this type of model is also within the hierarchical Bayesian family, but with far more sophistication and generality than that considere... |

1108 | A new, fast, and efficient image codec based on set partitioning in hierarchical trees
- Said, Pearlman
- 1996
(Show Context)
Citation Context ...he compressive properties of wavelets assures that �u − uN� 2 2 is typically small for N ≪ m, thereby motivating the use of wavelets in a new generation of compression techniques for images and video =-=[3]-=-, [4]. While wavelets have had a profound impact on practical compression schemes, there are issues that warrant further investigation. For example, while most natural signals are highly compressible ... |

947 | Sparse Bayesian learning and the relevance vector machine
- Tipping
- 2001
(Show Context)
Citation Context ...be set to zero with minimal impact on the reconstruction of ui). This relationship to linear regression makes existing sparse regression algorithms particularly relevant for CS inversion, for example =-=[12]-=-–[15]. Each of the CS measurements {vi}i=1,M yields a corresponding regression “task” vi → ˆ θi, and performing multiple such learning tasks has been referred to in the machine-learning community as m... |

856 | The Dantzig selector: statistical estimation when p is much larger than n
- Candès, Tao
- 2007
(Show Context)
Citation Context ...sparsity measure than the ℓ1-norm, and prove that even in the worst-case scenario, the RVM still outperforms the most widely used sparse representation algorithms, including BP 2 While previous works =-=[23]-=-, [24] in CS do obtain ℓ2 error bounds for function estimates, the “error bars” may be more useful from a practical standpoint as discussed follows. 3 A simple modification to (10) is available from [... |

770 | Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit - Tropp, Gilbert - 2007 |

715 | DJC: Bayesian Interpolation - Mackay - 1992 |

659 | Multitask learning
- Caruana
- 1997
(Show Context)
Citation Context ...CS measurements {vi}i=1,M yields a corresponding regression “task” vi → ˆ θi, and performing multiple such learning tasks has been referred to in the machine-learning community as multi-task learning =-=[16]-=-; ûi satisfies ûi = Ψ ˆ θi. Typical approaches to information transfer among tasks include: sharing hidden nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models [1... |

520 | Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems
- Figueiredo, Novak, et al.
- 2007
(Show Context)
Citation Context ...easurements if u is highly compressible in the basis Ψ. The utility of this framework has motivated development over the last few years of several techniques for performing the CS inversion v → û [7]–=-=[11]-=-. Before proceeding, it should be emphasized that CS is a framework that is not limited to wavelet-based representations. While wavelets played a key role in early developments of sparse signal repres... |

463 |
Bayes and Empirical Bayes Methods for Data Analysis
- Carlin, Louis
- 2000
(Show Context)
Citation Context ...uggests an iterative algorithm that alternates between these global and local solutions, as outlined next. This framework is related to extensive research in statistics on empirical Bayesian analysis =-=[32]-=-. Specifically, all of the data {vi}i=1,M are used to constitute point estimates for the parameters α and α0. Using the point estimate for α, one may specify the prior on the weights θi, via (3). Usin... |

432 | A framework for learning predictive structures from multiple tasks and unlabeled data
- Anto, Zhang
- 2005
(Show Context)
Citation Context ... nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models [19]–[22], sharing parameters of Gaussian processes [23], sharing a common structure on the predictor space =-=[24]-=-, and structured regularization in kernel methods [25], among others. In statistics, the problem of combining information from similar but independent experiments has been studied in the field of meta... |

419 | Information-based objective functions for active data selection - Mackay - 1992 |

413 | Wavelet-based statistical signal processing using hidden markov models. to appear
- Crouse, Nowak, et al.
- 1998
(Show Context)
Citation Context ...cs, although the location of the similar quadtrees are shifted within the image, commensurate with the associated object shift in the original image. Statistical models such as the hidden Markov tree =-=[36]-=- may be used to model the statistics of the quadtrees, and the multi-task sharing mechanisms may be implemented using more-sophisticated MTL tools than those investigated here. For example, the Dirich... |

357 | Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit - Tropp, Gilbert, et al. - 2006 |

336 |
Primary, secondary, and metaanalysis of research
- Glass
- 1976
(Show Context)
Citation Context ...ctured regularization in kernel methods [25], among others. In statistics, the problem of combining information from similar but independent experiments has been studied in the field of meta-analysis =-=[26]-=- for a variety of applications in medicine, psychology and education. Hierarchical Bayesian modeling is one of the most important methods for meta analysis [27]–[31]. Hierarchical Bayesian models prov... |

284 | The relevance vector machine
- Tipping
- 2000
(Show Context)
Citation Context ...osterior density function for w and α0. For example, one may conveniently implement a Markov Chain Monte Carlo (MCMC) [17] or, more efficiently and approximately, a variational Bayesian (VB) analysis =-=[18]-=-. While the VB analysis is efficient relative to MCMC, in the RVM a type-II maximum-likelihood (ML) procedure is considered, with the objective of achieving highly efficient computations while still p... |

270 | Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit. 2007. available online at http://www.dsp.ece.rice.edu/cs
- Donoho, Tsaig, et al.
(Show Context)
Citation Context ...i.e., measures of uncertainty, adaptive design of projection, etc. [10]), (hierarchical) Bayesian analysis also provides a flexible framework for multi-task CS. Conventional CS inverse algorithms [7]–=-=[9]-=- typically employ a point estimate for θi, and therefore are not directly amenable for information transfer among related multiple CS tasks. In addition to developing a multi-task CS framework, a modi... |

269 | Sparse solutions to linear inverse problems with multiple measurement vectors - Cotter, Rao, et al. - 2005 |

246 | Learning multiple tasks with kernel methods
- Evgeniou, Micchelli, et al.
- 2005
(Show Context)
Citation Context ...prior in hierarchical Bayesian models [19]–[22], sharing parameters of Gaussian processes [23], sharing a common structure on the predictor space [24], and structured regularization in kernel methods =-=[25]-=-, among others. In statistics, the problem of combining information from similar but independent experiments has been studied in the field of meta-analysis [26] for a variety of applications in medici... |

239 | Signal reconstruction fromnoisy randomprojections
- Haupt, NowakR
(Show Context)
Citation Context ...ty measure than the ℓ1-norm, and prove that even in the worst-case scenario, the RVM still outperforms the most widely used sparse representation algorithms, including BP 2 While previous works [23], =-=[24]-=- in CS do obtain ℓ2 error bounds for function estimates, the “error bars” may be more useful from a practical standpoint as discussed follows. 3 A simple modification to (10) is available from [25] by... |

224 | Approaches for Bayesian variable selection
- GEORGE, MCCULLOCH
- 1997
(Show Context)
Citation Context ...y efficient computations while still preserving accurate results. As one may note, the Bayesian linear model considered in RVM is essentially one of the simplified models for Bayesian model selection =-=[19]-=-–[21]. Although more accurate models may be desired, the main motivation of adopting the RVM is due to its highly efficient computation as discussed below. B. Bayesian CS Inversion via RVM Assume the ... |

189 | Signal recovery from partial information via orthogonal matching pursuit - Tropp, Gilbert - 2005 |

188 | Calibration and empirical Bayes variable selection - George, Foster |

162 |
The Bayesian choice: from decision-theoretic foundations to computational implementation.”, 2001
- Robert
(Show Context)
Citation Context ...icient computations while still preserving accurate results. As one may note, the Bayesian linear model considered in RVM is essentially one of the simplified models for Bayesian model selection [19]–=-=[21]-=-. Although more accurate models may be desired, the main motivation of adopting the RVM is due to its highly efficient computation as discussed below. B. Bayesian CS Inversion via RVM Assume the hyper... |

156 |
A Wavelet Tour of Signal Processing. 2nd ed
- Mallat
- 1999
(Show Context)
Citation Context ...e sensing (CS), Multi-task learning, Sparse Bayesian learning, Hierarchical Bayesian modeling, Modified relevance vector machine June 15, 2007 DRAFT 1sI. INTRODUCTION The development of wavelets [1], =-=[2]-=- has had a significant impact on several areas of signal processing and compression. An important characteristic of wavelets is the sparse representation of most natural signals in terms of a wavelet ... |

150 |
Bayesian Data Analysis (2nd ed
- Gelman, Carlin, et al.
- 2003
(Show Context)
Citation Context ...ccomplished using the Laplace prior directly, since the Laplace prior is not conjugate to the Gaussian likelihood and hence the associated Bayesian inference may not be performed in closed form [12], =-=[15]-=-. This issue has been addressed previously in sparse Bayesian learning, particularly, with the relevance vector machine (RVM) [16]. Rather than imposing a Laplace prior on w, in the RVM a hierarchical... |

143 | Sparse Bayesian learning for basis selection
- Wipf, Rao
- 2004
(Show Context)
Citation Context ...], [24] in CS do obtain ℓ2 error bounds for function estimates, the “error bars” may be more useful from a practical standpoint as discussed follows. 3 A simple modification to (10) is available from =-=[25]-=- by exploiting the matrix inverse identity, which leads to an O(K 3 ) operation per iteration. Nonetheless, the iterative (EM) implementation still does not scale well. July 15, 2007 DRAFT 8s[8] and O... |

129 | Learning Gaussian processes from multiple tasks - Yu, Tresp, et al. - 2005 |

115 | Fast marginal likelihood maximisation for sparse bayesian models
- Tipping, Faul
- 2003
(Show Context)
Citation Context ...owed by algebra, yields α new j α new 0 where µi,j is the jth component of µi. = = M � M i=1 µ2 i,j � M i=1 ni � M i=1 �vi − Φiµi� , j ∈ {1, 2, . . . , m}, (15) 2 , (16) 2) Fast Algorithm: Similar to =-=[33]-=-, considering the dependence of L(α, α0) on a single hyperparameter αj, j ∈ {1, 2, . . . , m}, we can decompose Ci in (14) as Ci = α −1 0 I + � k�=j α −1 k Φi,kΦ T i,k + α−1 j Φi,jΦ T i,j = Ci,−j + α ... |

96 | Learning Internal Representations - Baxter - 1995 |

85 | An empirical bayesian strategy for solving the simultaneous sparse approximation problem - Wipf, Rao - 2007 |

71 | Efficient, Low-Complexity Image Coding with a Set-Partitioning Embedded Block Coder
- Pearlman, Islam, et al.
- 2004
(Show Context)
Citation Context ...mpressive properties of wavelets assures that �u − uN� 2 2 is typically small for N ≪ m, thereby motivating the use of wavelets in a new generation of compression techniques for images and video [3], =-=[4]-=-. While wavelets have had a profound impact on practical compression schemes, there are issues that warrant further investigation. For example, while most natural signals are highly compressible in a ... |

68 | A new view of automatic relevance determination - Wipf, Nagarajan - 2008 |

63 | Learning to learn with the informative vector machine
- Lawrence, Platt
- 2004
(Show Context)
Citation Context ...information transfer among tasks include: sharing hidden nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models [19]–[22], sharing parameters of Gaussian processes =-=[23]-=-, sharing a common structure on the predictor space [24], and structured regularization in kernel methods [25], among others. In statistics, the problem of combining information from similar but indep... |

56 | A method for combining inference across related nonparametric Bayesian models - Müller, Quintana, et al. - 2004 |

55 | Analysis of sparse Bayesian learning
- Faul, Tipping
- 2001
(Show Context)
Citation Context ...more robust to the parameter setting than the original RVM formulation in Sec. II. As we will see soon, this modified formalism and the fast inference algorithm can be derived in a manner parallel to =-=[34]-=-. We define a zero-mean Gaussian prior for each component of θi, and define a Gamma prior on the noise precision α0: p(θi|α, α0) = M� N (θi,j | 0, α j=1 −1 0 α−1 j 11 ), (26) p(α0|a, b) = Ga(α0|a, b).... |

52 | Learning multiple related tasks using latent independent component analysis
- Zhang, Ghahramani, et al.
- 2006
(Show Context)
Citation Context ...i satisfies ûi = Ψ ˆ θi. Typical approaches to information transfer among tasks include: sharing hidden nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models [19]–=-=[22]-=-, sharing parameters of Gaussian processes [23], sharing a common structure on the predictor space [24], and structured regularization in kernel methods [25], among others. In statistics, the problem ... |

47 | Adaptive sparseness using Jeffreys prior - Figueiredo - 2001 |

39 | Simultaneous sparse approximation via greedy pursuit,” ICASSP - Tropp, Gilbert, et al. - 2005 |

35 |
Fedorov, Theory of Optimal Experiments
- V
- 1972
(Show Context)
Citation Context ...lecting projection rK+1, with the goal of reducing uncertainty. Such a framework has been previously studied in the machine learning community under the name of experimental design or active learning =-=[30]-=-–[32]. Further, the error bars also give a way to determine how many measurements are enough for faithful CS reconstruction, i.e., when the change in the uncertainty is not significant, it may be assu... |

33 | Robust Bayesian mixture modelling - Bishop, Svensen - 2005 |

30 | Perspectives on sparse bayesian learning
- Wipf, Palmer, et al.
- 2003
(Show Context)
Citation Context ...nly a relatively small set of wi, for which the corresponding αi remains relatively small, contribute for representation of g, and the level of sparseness (size of M) is determined automatically (see =-=[22]-=- for an interesting explanation from a variational approximation perspective). It is also important to note that, as a result of the type-II ML estimate (11), the point estimates (rather than the post... |

29 | Robust multi-task learning with t-processes - Yu, Tresp, et al. - 2007 |

28 | Collaborative ensemble learning: Combining collaborative and content-based information filtering via hierarchical Bayes
- Yu, Schwaighofer, et al.
- 2003
(Show Context)
Citation Context ...6]; ûi satisfies ûi = Ψ ˆ θi. Typical approaches to information transfer among tasks include: sharing hidden nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models =-=[19]-=-–[22], sharing parameters of Gaussian processes [23], sharing a common structure on the predictor space [24], and structured regularization in kernel methods [25], among others. In statistics, the pro... |

28 | Simultaneous approximation by greedy algorithms - Leviatan, Temlyakov - 2006 |

26 | A nonparametric hierarchical Bayesian framework for information filtering - Yu, Tresp, et al. - 2004 |

26 | A Bayesian semiparametric model for random-effects meta analysis
- Burr, Doss
- 2005
(Show Context)
Citation Context ...n studied in the field of meta-analysis [26] for a variety of applications in medicine, psychology and education. Hierarchical Bayesian modeling is one of the most important methods for meta analysis =-=[27]-=-–[31]. Hierarchical Bayesian models provide the flexibility to model both the individuality of tasks (experiments), and the correlations between tasks. Statisticians refer to this approach as “borrowi... |

26 | On the use of a priori information for sparse signal approximations - Escoda, Granai, et al. - 2006 |

22 |
Variational Bayes for continuous hidden Markov models and its application to active learning
- Ji, Krishnapuram, et al.
- 2006
(Show Context)
Citation Context ...ng projection rK+1, with the goal of reducing uncertainty. Such a framework has been previously studied in the machine learning community under the name of experimental design or active learning [30]–=-=[32]-=-. Further, the error bars also give a way to determine how many measurements are enough for faithful CS reconstruction, i.e., when the change in the uncertainty is not significant, it may be assumed t... |

14 |
Combining information from several experiments with nonparametric priors
- Mallick, Walker
- 1997
(Show Context)
Citation Context ...died in the field of meta-analysis [26] for a variety of applications in medicine, psychology and education. Hierarchical Bayesian modeling is one of the most important methods for meta analysis [27]–=-=[31]-=-. Hierarchical Bayesian models provide the flexibility to model both the individuality of tasks (experiments), and the correlations between tasks. Statisticians refer to this approach as “borrowing st... |

14 | Comparing the effects of different weight distributions on finding sparse representations - Wipf, Rao - 2006 |

10 | Nonparametric modeling of hierarchically exchangeable data - Hoff - 2003 |

9 | l 0-norm minimization for basis selection
- Wipf, Rao
(Show Context)
Citation Context ...e concise signal representation and is likely one of the explanations for the improvement in sparsity demonstrated in the experiments (see Sec. V). In addition, recent theoretical analysis of the RVM =-=[28]-=-, [29] indicates that the RVM provides a tighter approximation to the ℓ0-norm sparsity measure than the ℓ1-norm, and prove that even in the worst-case scenario, the RVM still outperforms the most wide... |

8 | Combining information from related regressions - Dominici, Parmigiani, et al. - 1997 |

3 | Empirical Bayes density regression - Dunson - 2007 |

1 |
model of inductive bias learning
- “A
- 2000
(Show Context)
Citation Context ...o in the machine-learning community as multi-task learning [16]; ûi satisfies ûi = Ψ ˆ θi. Typical approaches to information transfer among tasks include: sharing hidden nodes in neural networks [16]–=-=[18]-=-, placing a common prior in hierarchical Bayesian models [19]–[22], sharing parameters of Gaussian processes [23], sharing a common structure on the predictor space [24], and structured regularization... |

1 | Calibration and empirical Byes variable selection - George, Foster - 2000 |

1 | Distributed compressed sensing,” Nov - Baron, Wakin, et al. |