## Patch-based Video Processing: a Variational Bayesian Approach

Citations: | 4 - 0 self |

### BibTeX

@MISC{Li_patch-basedvideo,

author = {Xin Li and Yunfei Zheng},

title = {Patch-based Video Processing: a Variational Bayesian Approach},

year = {}

}

### OpenURL

### Abstract

Abstract—In this paper, we present a patch-based variational Bayesian framework for video processing and demonstrate its potential in denoising, inpainting and deinterlacing. Unlike previous methods based on explicit motion estimation, we propose to embed motion-related information into the relationship among video patches and develop a nonlocal sparsity-based prior for typical video sequences. Specifically, we first extend block matching (Nearest Neighbor search) into patch clustering (k-Nearest-Neighbor search), which represents motion in an implicit and distributed fashion. Then we show how to exploit the sparsity constraint by sorting and packing similar patches, which can be better understood from a manifold perspective. Under the Bayesian framework, we treat both patch clustering result and unobservable data as latent variables and solve the inference problem via variational EM algorithms. A weighted averaging strategy of fusing diverse inference results from overlapped patches is also developed. The effectiveness of patch-based video models is demonstrated by extensive experimental results on a wider range of video materials. Index Terms—video processing, patch-based models, sparsitybased priors, variational Bayesian, variational EM, weighted averaging.

### Citations

1827 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1998
(Show Context)
Citation Context ... before reducing it. Such counter-intuitive strategy is the consequence of patch clustering (an extension of block matching) that adapts signal representations to fixed bases instead of basis pursuit =-=[13]-=- (adapt bases to fixed signal representations). • A variational Bayesian approach towards patch-based video restoration. We treat noisy or incomplete data under a unified Bayesian framework, which nat... |

1808 |
A global geometric framework for nonlinear dimensionality reduction
- Tenenbaum, Silva, et al.
- 2000
(Show Context)
Citation Context ...ch geometric interpretation might skip this part without any loss of continuity). The manifold language is widely used in the literature of nonlinear dimensionality reduction techniques (e.g., Isomap =-=[23]-=- and linear linear embedding [24]) and its implications into statistical modeling of natural images have also been briefly discussed in [25]. Loosely speaking, the manifold constraint of p(B) states t... |

1498 |
Robot Vision
- Horn
- 1986
(Show Context)
Citation Context ...CTION A. From explicit to implicit motion representation Motion representation plays a fundamental role in digital processing of video signals. Correspondence-based motion models such as optical flow =-=[1]-=- and block motion field [2] have been the most popular in the literature of video coding and processing. In correspondence-based models, motion information is explicitly represented by either velocity... |

1435 | Data clustering: A review
- Jain, Murty, et al.
- 1999
(Show Context)
Citation Context ...eighborhoods on the manifold (denoted by the black and gray color in Fig. 2a), their 1D subspace structure (tangent direction) varies, which calls for the adaptation (e.g., data clustering via kmeans =-=[27]-=-, local PCA analysis [28]). However, such adaptation blows up the computational burden easily (just like Karhunen-Loeve transform [29] is theoretically optimal but practically infeasible). In view of ... |

938 | Ideal spatial adaptation by wavelet shrinkage
- Donoho, Johnstone
- 1994
(Show Context)
Citation Context ...on [31] at the patch level (formally we denote it by E[x|Dy]). Many existing methods of enforcing the sparsity constraint by coring operators [32] (e.g., Wiener filtering [33], soft/hard thresholding =-=[34]-=-, [35]) can be used to restore a collection of clustered patches (refer to Fig. 3a). What has been less investigated is the consistency issue among overlapping patches (refer to Fig. 3b). In the liter... |

849 |
Adaptive mixtures of local experts
- Jacobs, Jordan, et al.
- 1991
(Show Context)
Citation Context ...unction of p(Bz|Dz ′) is adopted because it does not depend on Bz. Note that if an exponential function is used for p(Dz ′), Eq. (5) is similar to the softmax function used in mixture of expert model =-=[38]-=-. Substituting Eq. (4) into Eq. (2), we can have the following simplified result p(x|y) = � a (i)p(x|Di), (6) i where parenthesized subscript i stands for the i-th hypothesis (it is the compounding re... |

823 | An Optimal Algorithm for Approximate Nearest Neighbor Searching
- Arya, Mount, et al.
- 1994
(Show Context)
Citation Context ...on video, it could take days instead of hours to run our non-optimized MATLAB implementation. Fortunately, fast NN/kNN search algorithms have been extensively studied in the literature (e.g., kd-tree =-=[84]-=-), which support the practical feasibility of our approach. A variant of kNN search is to collect all neighbors within an ɛ-ball in the Euclidean space which also admits a fast search algorithm suitab... |

811 | A view of the EM algorithm that justifies incremental, sparse, and other variants, ser
- Neal, Hinton
- 1998
(Show Context)
Citation Context ...ods (e.g., [14]), we can formulate the following dual MAP estimation problems ˆT = argmax T � x p(T, x|y), ˆx = argmax x � P (x, T|y) (10) which can be solved iteratively by a variant of EM algorithm =-=[40]-=-. The key idea is to obtain a lower bound for L(T) = ln p(y|T) = ln � p(x, y|T)dx which is a functional of free distributions of latent variables. Using Jensen’s inequality, it can be shown that any p... |

572 |
Differential Geometry of Curves and Surfaces
- Carmo, P
- 1976
(Show Context)
Citation Context ...the collection of patches - globally they occupy a 2D space but locally they are fully characterized by the 1D subspace structure (more formally, a manifold is locally homeomorphic to Euclidean space =-=[26]-=-). Despite the simplification of this toy example, it is sufficient to justify: 1) Why more data points in the neighborhood are needed to discover the local subspace structure? Determining the tangent... |

481 | Learning low-level vision
- Freeman, Pasztor, et al.
(Show Context)
Citation Context ...xed bases. Even though such strategy seemingly increased the dimensionality, the overall sparsity actually improves since the 9 It plays a similar role to the compatibility function in Markov network =-=[15]-=- and the hidden mapping function in epitome analysis [82]. data array involves nonlocal patches distant from each other. When such nonlocal sparsity is combined with distributed motion representation ... |

407 | Image inpainting
- Bertalmio, Sapiro, et al.
- 2000
(Show Context)
Citation Context ...oncealment, we note that Algorithm 2 has three salient features: • It recovers missing data in a parallel instead of sequential fashion. The majority of inpainting techniques in the literature (e.g., =-=[66]-=-, [67], [35], [5]) sequentially recover the missing data, which often suffers from the notorious error propagation problem (because previously-recovered data are often used to facilitate the recovery ... |

391 | Image denoising using scale mixtures of gaussians in the wavelet domain
- Portilla, Strela, et al.
- 2003
(Show Context)
Citation Context ...own T. A. Combining Multi-hypothesis Inference via Weighted Averaging With the sparsity-based prior on transform coefficients S, it is straightforward to obtain Bayesian Least-Square (BLS) estimation =-=[31]-=- at the patch level (formally we denote it by E[x|Dy]). Many existing methods of enforcing the sparsity constraint by coring operators [32] (e.g., Wiener filtering [33], soft/hard thresholding [34], [... |

359 | Q.: Error Control and Concealment for Video Communications: A Review
- Wang, Zhu
- 1998
(Show Context)
Citation Context ...B), denoised by Alg. 1 (P SNR = 26.55dB); Bottom (mobile): from-left-to-right, original, noisy, denoised by K-SVD [12] (P SNR = 25.96dB), denoised by Alg. 1 (P SNR = 26.06dB) information (e.g., [61], =-=[62]-=-) or adopt specially designed ME methods (e.g., [63], [64], [65]). Since EC is performed at the decoder as a means of fighting against channel impairments, it does not impose any constraint on motion ... |

321 |
Image denoising via sparse and redundant representations over learned dictionaries
- Elad, Aharon
- 2006
(Show Context)
Citation Context ...o different patches in RN but their associated data arrays D, D ′ are aligned along the same orientation in R2N ). on local bases that are either pre-fixed (e.g., DCT and WT) or adaptive (e.g., [20], =-=[21]-=-, [22]), ours can be viewed as a nonlocal adaptive approach - we still use pre-fixed bases but achieve the adaptation by nonlocally transforming signal representations with the aid of latent variable ... |

298 | Variational algorithms for approximate Bayesian inference
- Beal
- 2003
(Show Context)
Citation Context ...iew of the uncertainty with the patch clustering result and unobservable data, we propose to treat them both as latent variables and solve the Bayesian inference problem via variational EM algorithms =-=[14]-=-. Such rigorous treatment gives video restoration a solid ground for objective performance optimization. • A weighted averaging strategy of fusing diverse inference results under patch-based video mod... |

266 | Adaptive wavelet thresholding for image denoising and compression
- Chang, Yu, et al.
- 2000
(Show Context)
Citation Context ...tegies of enforcing the sparsity constraint in the transform domain: thresholding and filtering. In the former case, E[x|Dy] is simply obtained by hard thresholding the transform coefficients S [34], =-=[55]-=-; in the latter case, MMSE estimation is given by the classical Wiener filtering formula [32], [31]. Since Wiener filtering requires the knowledge about the variance of clean signal, BM3D method [18] ... |

247 | Image denoising by sparse 3-d transform-domain collaborative filtering
- DABOV, FOI, et al.
(Show Context)
Citation Context ...ased video models in several applications, their optimality in the presence of noisy or incomplete data becomes questionable. Inspired by the recent advance in patch-based image denoising (e.g., BM3D =-=[18]-=-), we propose a transform-based approach of obtaining the sparsity constraint (instead of linear expansion) for the kNNs of a given patch. Specifically, the clustered k + 1 patches are sorted and pack... |

241 | A Non-Local Algorithm for Image Denoising - Buades, Coll, et al. |

237 | M.: Fields of experts: A framework for learning image priors
- Roth, Black
- 2005
(Show Context)
Citation Context ...f removing additive white Gaussian noise N(0, σ 2 w). Such denoising problem has been extensively studied for still images (e.g., wavelet-based [32], [31], patchbased [42], [18], [43], learning-based =-=[44]-=-, [21], [45], [46]); 6 Additional consideration is needed for smooth regions; we will elaborate on this issue in denoising application later video denoising has also received increasingly more attenti... |

207 | Noise removal via Bayesian wavelet coring
- Simoncelli, Adelson
- 1996
(Show Context)
Citation Context ...ightforward to obtain Bayesian Least-Square (BLS) estimation [31] at the patch level (formally we denote it by E[x|Dy]). Many existing methods of enforcing the sparsity constraint by coring operators =-=[32]-=- (e.g., Wiener filtering [33], soft/hard thresholding [34], [35]) can be used to restore a collection of clustered patches (refer to Fig. 3a). What has been less investigated is the consistency issue ... |

205 |
A Wavelet Tour
- Mallat
- 1999
(Show Context)
Citation Context ...tches of similar appearance, it can be sparsified by standard bases with good localization properties - e.g., fast-Fourier-transform (FFT), Discrete-cosine-transform (DCT) and wavelet transforms (WT) =-=[19]-=-. At the first sight, such sparsification strategy is counterintuitive because the dimensionality apparently increases due to packing. However, we note that the modeling unit - data array D - can be v... |

201 | Hierarchical Bayesian inference in the visual cortex
- Lee, Mumford
- 2003
(Show Context)
Citation Context ...l study the application of this new patch-based model into several video restoration tasks. Since those restoration algorithms are related to the degradation model p(y|x) (also called “imaging model” =-=[41]-=-) which varies from application to application, we will emphasize the varying considerations in algorithm development and implementation details in different scenarios. IV. APPLICATIONS In this sectio... |

192 | Region filling and object removal by exemplar-based inpainting
- Criminisi, Perez, et al.
- 2004
(Show Context)
Citation Context ...ment, we note that Algorithm 2 has three salient features: • It recovers missing data in a parallel instead of sequential fashion. The majority of inpainting techniques in the literature (e.g., [66], =-=[67]-=-, [35], [5]) sequentially recover the missing data, which often suffers from the notorious error propagation problem (because previously-recovered data are often used to facilitate the recovery of the... |

134 | A simple algorithm for nearest neighbor search in high dimensions
- Nene, Nayar
- 1003
(Show Context)
Citation Context ...lity of our approach. A variant of kNN search is to collect all neighbors within an ɛ-ball in the Euclidean space which also admits a fast search algorithm suitable for hardware implementation (e.g., =-=[85]-=-). On the other hand, we believe many existing fast algorithms developed for block matching can be borrowed to speed up the search of kNNs especially when some prior knowledge is available (e.g., smoo... |

128 | Generalized principal component analysis (GPCA
- Vidal, Ma, et al.
(Show Context)
Citation Context ...ection regardless of the patch content. Therefore, even pre-fixed bases in R 2N are sufficient for sparsifying the arrays consisting of clustered patches 2 . When compared with generalized PCA (GPCA) =-=[30]-=-, we note that our goal here is regression instead of classification. Therefore, any point on the manifold is never associated with a hard label but loosely connects multiple points in its neighborhoo... |

121 | Temporal texture modeling
- Szummer, Picard
- 1996
(Show Context)
Citation Context ...nstead of explicit motion representations. Implicit motion representations can be roughly classified into two categories: local and nonlocal. In local approaches (e.g., spatio-temporal autoregressive =-=[4]-=- and its extension [5]), motion information is implicitly embedded Copyright (c) 2008 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes ... |

113 | On advances in statistical modeling of natural images
- Srivastava, Lee, et al.
- 2003
(Show Context)
Citation Context ...nonlinear dimensionality reduction techniques (e.g., Isomap [23] and linear linear embedding [24]) and its implications into statistical modeling of natural images have also been briefly discussed in =-=[25]-=-. Loosely speaking, the manifold constraint of p(B) states that the collection of spatiotemporal patches do not span the full-dimensional space R N but a much lower-dimensional manifold M. To facilita... |

110 | Space–time video completion
- Wexler, Shechtman, et al.
- 2004
(Show Context)
Citation Context ...we conjecture an extension of VBM3D implementation into 3D patches could lead even further improved denoising performance, which is left as the future work. B. Video Inpainting Video inpainting [59], =-=[60]-=-, [9] refers to a class of techniques of recovering missing data in some specified region of video (often called inpainting domain ΩI). When inpainting domain consists of isolated or consecutive block... |

109 | Dimension reduction by local principal component analysis
- Kambhatla, Leen
- 1997
(Show Context)
Citation Context ...old (denoted by the black and gray color in Fig. 2a), their 1D subspace structure (tangent direction) varies, which calls for the adaptation (e.g., data clustering via kmeans [27], local PCA analysis =-=[28]-=-). However, such adaptation blows up the computational burden easily (just like Karhunen-Loeve transform [29] is theoretically optimal but practically infeasible). In view of the difficulties with ada... |

96 |
Nonlinear Dimensionality Reduction by Locally
- Roweis, Saul
- 2000
(Show Context)
Citation Context ... skip this part without any loss of continuity). The manifold language is widely used in the literature of nonlinear dimensionality reduction techniques (e.g., Isomap [23] and linear linear embedding =-=[24]-=-) and its implications into statistical modeling of natural images have also been briefly discussed in [25]. Loosely speaking, the manifold constraint of p(B) states that the collection of spatiotempo... |

71 | Overlapped block motion compensation - an estimation-theoretic approach
- Orchard, Sullivan
- 1994
(Show Context)
Citation Context ...otally ignores the uncertainty arising from estimation errors. By contrast, kNNbased patch clustering diversifies the correspondence results - the benefits of such multiple hypotheses to video coding =-=[17]-=- or distributed representations to motion analysis [3] have been well understood in the literature. Here, we argue that another advantage of considering kNN instead of NN is to facilitate the exploita... |

68 | Fast image and video denoising via nonlocal means of similar neighborhoods
- Mahmoudi, Sapiro
- 2005
(Show Context)
Citation Context ... pubs-permissions@ieee.org. into 3D autoregressive coefficients that are learned within a localized spatio-temporal window. Nonlocal approaches are mostly patch-based 1 (e.g., nonlocal mean [6], [7], =-=[8]-=-, video epitome [9], space-time adaptive filtering [10], VBM3D [11], K-SVD [12]): motion-related temporal dependency is implicitly characterized by the clustering relationship among video patches. Clu... |

62 | Distributed Representation and Analysis of Visual Motion, Ph.D. dissertation, Vision and Modeling Group
- Simoncelli
- 1993
(Show Context)
Citation Context ...sentation is conceptually simple but suffers from a fundamental weakness - i.e., due to the difficulty of establishing correct correspondence for every pixel, the uncertainty in motion representation =-=[3]-=- is inevitable and often difficult to address especially in low-level vision tasks where motion is exploited to resolve intensity-related uncertainties. In recent years, there have been a flurry of wo... |

61 | Learning multiscale sparse representations for image and video restoration. SIAM Multiscale Modeling and Simulation
- Mairal, Sapiro, et al.
(Show Context)
Citation Context ...e white Gaussian noise N(0, σ 2 w). Such denoising problem has been extensively studied for still images (e.g., wavelet-based [32], [31], patchbased [42], [18], [43], learning-based [44], [21], [45], =-=[46]-=-); 6 Additional consideration is needed for smooth regions; we will elaborate on this issue in denoising application later video denoising has also received increasingly more attention in recent years... |

56 | Video denoising using 2d and 3d dual-tree complex wavelet transforms
- Selesnick, Li
- 2003
(Show Context)
Citation Context ...l consideration is needed for smooth regions; we will elaborate on this issue in denoising application later video denoising has also received increasingly more attention in recent years (e.g., [47], =-=[48]-=-, [8], [49], [50], [51], [52], [53], [10], [54], [11], [12]). Our approach is closely related to BM3D denoising [18] and K-SVD denoising [12] - the ideas of patch clustering and sparsifying transform ... |

48 | Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images
- Foi, Katkovnik, et al.
- 2007
(Show Context)
Citation Context ...ble motion)? How to adaptively choose the size and shape of patch for different sequences and applications? Is it worthwhile to use irregular patches such as anisotropic support in shape-adaptive DCT =-=[83]-=-? All these questions are related to the maximization of sparsity in patch-based video models. The other closely related question is how to choose the neighborhood size k = |N (B0)|. In our experiment... |

47 | Adaptive deblocking filter
- List, Joch, et al.
- 2003
(Show Context)
Citation Context ...liasing artifacts could also violate the sparsity constraint. Consequently, we believe patch-based video models can also find other successful applications such as post-processing of compressed video =-=[81]-=- and video super-resolution [9].sa) b) c) d) e) f) Fig. 14. Subjective quality comparison for a 150 × 150 portion of the 21th frame in hall (top) and mobile (bottom) CIF sequences: a) original; b) DIM... |

42 | Nonlinear approximation based image recovery using adaptive sparse reconstructions and iterated denoising: Part ii - adaptive algorithms
- Guleryuz
(Show Context)
Citation Context ...ent two different patches in RN but their associated data arrays D, D ′ are aligned along the same orientation in R2N ). on local bases that are either pre-fixed (e.g., DCT and WT) or adaptive (e.g., =-=[20]-=-, [21], [22]), ours can be viewed as a nonlocal adaptive approach - we still use pre-fixed bases but achieve the adaptation by nonlocally transforming signal representations with the aid of latent var... |

41 | Video epitomes
- Cheung, J, et al.
- 2005
(Show Context)
Citation Context ...eee.org. into 3D autoregressive coefficients that are learned within a localized spatio-temporal window. Nonlocal approaches are mostly patch-based 1 (e.g., nonlocal mean [6], [7], [8], video epitome =-=[9]-=-, space-time adaptive filtering [10], VBM3D [11], K-SVD [12]): motion-related temporal dependency is implicitly characterized by the clustering relationship among video patches. Clustered patches are ... |

35 |
Spatially adaptive image denoising under overcomplete expansion
- Li, Orchard
- 2000
(Show Context)
Citation Context ...n Least-Square (BLS) estimation [31] at the patch level (formally we denote it by E[x|Dy]). Many existing methods of enforcing the sparsity constraint by coring operators [32] (e.g., Wiener filtering =-=[33]-=-, soft/hard thresholding [34], [35]) can be used to restore a collection of clustered patches (refer to Fig. 3a). What has been less investigated is the consistency issue among overlapping patches (re... |

33 |
Temporal and spatial error concealment techniques for hierarchical MPEG-2 video codec
- Aign, Fazel
- 1995
(Show Context)
Citation Context ...al results next). • It relies on implicit instead of explicit motion representation. Although many concealment techniques for video have conveniently assumed the knowledge of block-based motion field =-=[69]-=-, we argue that such assumption does more harm than good because block-based motion field received at the decoder is at most an approximation of the true one. A better strategy is to exploit temporal ... |

29 | Image sequence denoising via sparse and redundant representations - Protter, Elad - 2009 |

29 |
Deinterlacing-an overview
- Haan, Bellers
- 1998
(Show Context)
Citation Context ...tennis sequence containing camera zoom is particularly impressive. C. Video Deinterlacing Deinterlacing is another problem that has been extensively studied in the literature (e.g., [71], [72], [73], =-=[74]-=-, [75], [76], [77]) but remained challenging. In view of the large amount of video material stored in the interlaced format, deinterlacing remains an important technique which can help PSNR(dB) a) b) ... |

27 | Multiscale hybrid linear models for lossy image representation
- Hong, Wright, et al.
(Show Context)
Citation Context ...erent patches in RN but their associated data arrays D, D ′ are aligned along the same orientation in R2N ). on local bases that are either pre-fixed (e.g., DCT and WT) or adaptive (e.g., [20], [21], =-=[22]-=-), ours can be viewed as a nonlocal adaptive approach - we still use pre-fixed bases but achieve the adaptation by nonlocally transforming signal representations with the aid of latent variable T. In ... |

24 | Denoising image sequences does not require motion estimation
- Buades, Coll, et al.
(Show Context)
Citation Context ...il to pubs-permissions@ieee.org. into 3D autoregressive coefficients that are learned within a localized spatio-temporal window. Nonlocal approaches are mostly patch-based 1 (e.g., nonlocal mean [6], =-=[7]-=-, [8], video epitome [9], space-time adaptive filtering [10], VBM3D [11], K-SVD [12]): motion-related temporal dependency is implicitly characterized by the clustering relationship among video patches... |

24 | Bayesian Non-Local Means Filter, Image Redundancy and Adaptive Dictionaries for Noise Removal
- Kervrann, Boulanger, et al.
- 2007
(Show Context)
Citation Context ...e classical problem of removing additive white Gaussian noise N(0, σ 2 w). Such denoising problem has been extensively studied for still images (e.g., wavelet-based [32], [31], patchbased [42], [18], =-=[43]-=-, learning-based [44], [21], [45], [46]); 6 Additional consideration is needed for smooth regions; we will elaborate on this issue in denoising application later video denoising has also received incr... |

24 | Error Concealment Techniques for encoded video streams
- Salama, Shroff, et al.
- 1995
(Show Context)
Citation Context ...25.31dB), denoised by Alg. 1 (P SNR = 26.55dB); Bottom (mobile): from-left-to-right, original, noisy, denoised by K-SVD [12] (P SNR = 25.96dB), denoised by Alg. 1 (P SNR = 26.06dB) information (e.g., =-=[61]-=-, [62]) or adopt specially designed ME methods (e.g., [63], [64], [65]). Since EC is performed at the decoder as a means of fighting against channel impairments, it does not impose any constraint on m... |

19 | Space-time adaptation for patch-based image sequence restoration - Boulanger, Kervrann, et al. |

19 | Video denoising by sparse 3D transform-domain collaborative filtering
- Dabov, Foi, et al.
- 2007
(Show Context)
Citation Context ...at are learned within a localized spatio-temporal window. Nonlocal approaches are mostly patch-based 1 (e.g., nonlocal mean [6], [7], [8], video epitome [9], space-time adaptive filtering [10], VBM3D =-=[11]-=-, K-SVD [12]): motion-related temporal dependency is implicitly characterized by the clustering relationship among video patches. Clustered patches are then described by parametric Gaussian distributi... |

19 | Video denoising by combining kalman and wiener estimates
- Dugad, Ahuja
- 1999
(Show Context)
Citation Context ...itional consideration is needed for smooth regions; we will elaborate on this issue in denoising application later video denoising has also received increasingly more attention in recent years (e.g., =-=[47]-=-, [48], [8], [49], [50], [51], [52], [53], [10], [54], [11], [12]). Our approach is closely related to BM3D denoising [18] and K-SVD denoising [12] - the ideas of patch clustering and sparsifying tran... |