## Non-Stationary Bayesian Modelling and Enhancement of Speech Signals (1999)

Citations: | 4 - 4 self |

### BibTeX

@TECHREPORT{Vermaak99non-stationarybayesian,

author = {J Vermaak and C Andrieu and A Doucet and Sj Godsill and J. Vermaak and C. Andrieu and A. Doucet and S. J. Godsill},

title = {Non-Stationary Bayesian Modelling and Enhancement of Speech Signals},

institution = {},

year = {1999}

}

### OpenURL

### Abstract

This report applies time-varying AR (TVAR) models with stochastically evolving parameters to the problem of speech modelling and enhancement. For the TVAR coefficients the standard parameterisation, i.e. the coefficients of the TVAR polynomial themselves, and one i.t.o. the characteristic roots of the TVAR polynomial (or system poles) are investigated. The stochastic evolution models for the TVAR parameters are Markovian diffusion processes. The problem and estimation objectives are formulated within a Bayesian framework. Two efficient iterative algorithms are developed to achieve these objectives. The first is a Markov chain Monte Carlo (MCMC) algorithm which generates samples from the posterior distribution based on which the minimum mean square error (MMSE) estimates of the TVAR parameters and clean speech can be computed. The second is a stochastic optimisation algorithm which computes the marginal maximum a posteriori (MMAP) estimate of the TVAR parameters. The clean speech can then be obtained by running a fixed-interval Kalman smoother with this estimate of the TVAR parameters. Contrary to the EM-type algorithms, the estimation schemes work without introducing a set of "missing data" (the clean speech in this case). Nevertheless, at each iteration the computational complexity of the algorithms is still linear in the number of samples in the analysis window. Performance measures based on predictive distributions are used in simulation studies to compare the modelling and signal reconstruction performance of the proposed TVAR models to that of the standard fixed-parameter AR model on both synthetic and real speech data sets. Keywords: Speech enhancement, TVAR models, Non-stationary speech modelling 1

### Citations

753 | Markov chains for exploring posterior distributions
- Tierney
- 1994
(Show Context)
Citation Context ...Two classes of iterative algorithms that have been successfully applied before in the context of numerical Bayesian inference are the stochastic Markov chain Monte Carlo (MCMC) methods (Tanner, 1993; =-=Tierney, 1994-=-) and simulated annealing optimisation algorithms (van Laarhoven and Arts, 1987). Here one MCMC and one simulated annealing optimisation algorithm are developed to perform minimum mean square error (M... |

413 |
Linear prediction: A tutorial review
- Makhoul
- 1976
(Show Context)
Citation Context ...dge, UK. 3 A. Doucet is supported by EPSRC, UK. 2 Introduction 1 INTRODUCTION A widely used and popular model for the speech production system is the autoregressive (AR) process (Deller et al., 1993; =-=Makhoul, 1975-=-; Rabiner and Schafer, 1978). This model exploits the local correlations in a time series by forming the prediction for the current sample as a linear combination of the immediately preceding samples.... |

375 | Simulated Annealing: Theory and Applications - Laarhoven, Aarts - 1987 |

190 |
Covariance structure of the gibbs sampler with applications to the comparison of estimators and augmentation schemes. Biometrika
- Liu, J, et al.
- 1994
(Show Context)
Citation Context ...esponds to the case where f (` 0:T ) = ` 0:T . On the other hand, ff MMSE 0:T (N) is the mixture estimator derived from n ` (i) 0:T : i 2 N o , so that the result follows straightforwardly (see e.g. (=-=Liu et al., 1994-=-)). 3.4 TVAR Specifics To apply the sampling algorithm of Section 3.1 to the TVAR signal model specified in Section 2, it remains to compute the proposal distribution q MCMC (` t ) corresponding to th... |

113 |
Enhancement and bandwidth compression of noisy speech
- Lim, Oppenheim
- 1979
(Show Context)
Citation Context ...n led to the development of many efficient speech enhancement and signal reconstruction algorithms (see (Godsill and Rayner, 1995; Godsill, 1997; Godsill and Rayner, 1998b; Godsill and Rayner, 1998a; =-=Lim and Oppenheim, 1979-=-; Vermaak and Niranjan, 1998) for some examples), and is hence also adopted here. Equation (1) describes a linear time-invariant acoustic system composed of a concatenation of equal-length, constant-d... |

52 |
Random coefficient autoregressive models: An introduction, Lecture notes in Statistics
- Nicholls, Quinn
- 1982
(Show Context)
Citation Context ... processing, the deterministic models are not further considered here. Stochastically evolving coefficient models were proposed and developed by numerous authors (see e.g. (Kitagawa and Gersch, 1985; =-=Quinn and Nicholls, 1981-=-)), and applied to speech in e.g. (Dembo and Zeitouni, 1988). The coefficients are assumed to evolve in a Markovian manner, with a first-order Gaussian random walk being the most common choice. This m... |

30 |
A solution of the smoothing problem for linear dynamic systems
- Mayne
- 1966
(Show Context)
Citation Context ...:t ) P 0 \Gamma1 tjt (` t:T ) + I n ff j \Gamma1 ' \Theta i m tjt\Gamma1 (` 0:t ) \Gamma P tjt\Gamma1 (` 0:t ) P 0 \Gamma1 tjt (` t:T ) m 0 tjt (` t:T ) j : (39) This result was first established in (=-=Mayne, 1966-=-) where the linear smoothing problem is formulated as an optimisation problem, decomposed as one minimisation w.r.t. past data, and one w.r.t. future data. Later in (Fraser and Potter, 1969) it was su... |

29 |
Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms
- Wakita
- 1973
(Show Context)
Citation Context ...s hence also adopted here. Equation (1) describes a linear time-invariant acoustic system composed of a concatenation of equal-length, constant-diameter non-dissipative tubes (Atal and Hanauer, 1971; =-=Wakita, 1973-=-). Thus, associated with the coefficients is a stylised articulatory configuration which remains fixed throughout the analysis interval. In reality, however, the vocal tract is continually changing, s... |

27 |
Smoothness Priors Analysis of Time
- Kitagawa, Gersch
- 1996
(Show Context)
Citation Context ... i=1 p ( ` i;t j ` i;t\Gamma1 ) ; 8t 2 T : (10) This evolution model is similar to the previously defined "smoothness priors" models for the TVAR coefficients, proposed in (Kitagawa and Gers=-=ch, 1985; Kitagawa and Gersch, 1996-=-). It is more general, since the framework adopted here allows for arbitrary diffusion processes on all the model parameters. 2.3 TVAR Model Parameterisations In this report two parameterisations for ... |

25 |
The fitting of nonstationary time-series models with timedependent parameters
- RAO
- 1970
(Show Context)
Citation Context ... are briefly considered next. Most of the deterministic methods consider the TVAR coefficients to be a linear combination 3 of a set of known basis functions. Such an approach was first suggested in (=-=Rao, 1970-=-) and subsequently applied to speech processing in e.g. (Grenier, 1983; Hall et al., 1983; Liporace, 1975). Commonly employed basis functions are the Legendre polynomials, Fourier basis functions, dis... |

24 |
A smoothness priors time-varying (AR) coefficient modeling of nonstationary covariance time series
- Kitagawa, Gersch
- 1985
(Show Context)
Citation Context ...lications, including speech processing, the deterministic models are not further considered here. Stochastically evolving coefficient models were proposed and developed by numerous authors (see e.g. (=-=Kitagawa and Gersch, 1985-=-; Quinn and Nicholls, 1981)), and applied to speech in e.g. (Dembo and Zeitouni, 1988). The coefficients are assumed to evolve in a Markovian manner, with a first-order Gaussian random walk being the ... |

14 |
Linear Estimation of Nonstationary Signals
- Liporace
- 1975
(Show Context)
Citation Context ... standard AR process where the AR coefficients are allowed to vary with time. TVAR models were proposed in the context of speech modelling in e.g. (Grenier, 1983; Ha and Ann, 1995; Hall et al., 1983; =-=Liporace, 1975-=-). Since it allows for the continually changing behaviour of the speech signal, the TVAR model leads to more accurate signal representations. In addition, analysis over longer data frames is possible.... |

9 | Bayesian approach to parameter estimation and interpolation of time-varying autoregressive processes using the Gibbs sampler
- Rajan, Rayner, et al.
- 1997
(Show Context)
Citation Context ...s, Fourier basis functions, discrete prolate spherical sequences (DPSS) and B-splines, all of which perform well if the signal statistics vary relatively slowly with time. In (Rajan and Rayner, 1996; =-=Rajan et al., 1997-=-) an approach is described where the basis functions are constructed, based on the data, leading to more accurate signal representations for general non-stationary signals. The projection of the signa... |

7 |
Tools for Statistical Inference, Second Edition
- Tanner
- 1993
(Show Context)
Citation Context ...eir solution. Two classes of iterative algorithms that have been successfully applied before in the context of numerical Bayesian inference are the stochastic Markov chain Monte Carlo (MCMC) methods (=-=Tanner, 1993-=-; Tierney, 1994) and simulated annealing optimisation algorithms (van Laarhoven and Arts, 1987). Here one MCMC and one simulated annealing optimisation algorithm are developed to perform minimum mean ... |

6 |
Convergence control techniques for MCMC algorithms
- Robert
- 1995
(Show Context)
Citation Context ...istri4 Methods for determining the burn-in period are beyond the scope of this report. The interested reader is referred to the vast body of literature on this subject, e.g. (Cowles and Carlin, 1996; =-=Robert, 1995-=-). 18 Experiments and Results bution p ( y t j y \Gammat ) can generally not be evaluated analytically, but if it is expressed as p (y t j y \Gammat ) = p (y 1:T ) p (y \Gammat ) = `Z p (y \Gammat ; f... |

5 |
Generalized feature extraction for time-varying autoregressive models
- RAJAN, Rayner
- 1996
(Show Context)
Citation Context ... the Legendre polynomials, Fourier basis functions, discrete prolate spherical sequences (DPSS) and B-splines, all of which perform well if the signal statistics vary relatively slowly with time. In (=-=Rajan and Rayner, 1996-=-; Rajan et al., 1997) an approach is described where the basis functions are constructed, based on the data, leading to more accurate signal representations for general non-stationary signals. The pro... |

5 | Markov Chain Monte Carlo methods for speech enhancement - Vermaak, Niranjan - 1998 |