Results 1  10
of
11
Single microphone source separation using high resolution signal reconstruction
 in Proc. ICASSP ’04
, 2004
"... We present a method for separating two speakers from a single microphone channel. The method exploits the fine structure of male and female speech and relies on a strong high frequency resolution model for the source signals. The algorithm is able to identify the correct combination of male and fema ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
We present a method for separating two speakers from a single microphone channel. The method exploits the fine structure of male and female speech and relies on a strong high frequency resolution model for the source signals. The algorithm is able to identify the correct combination of male and female speech that best explains an observation and is able to reconstruct the component signals, relying on prior knowledge to ‘fill in ’ regions that are masked by the other speaker. The two speaker single microphone source separation problem is one of the most challenging source separation scenarios and few quantitative results have been reported in the literature. We provide a test set based on the Aurora 2 data set and report performance numbers on a portion of this set. We achieve results of 6.59 dB average increase in SNR for female speakers and 5.51 dB for male speakers. 1.
Modelbased scene analysis
 In
, 2006
"... Abstract The general problem of separating multiple sources mixed into a single channel is illposed; in order to solve it, additional constraints must be applied. One very general class of constraints is prior knowledge of the source signals: limitations on the waveforms possible for each source ca ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Abstract The general problem of separating multiple sources mixed into a single channel is illposed; in order to solve it, additional constraints must be applied. One very general class of constraints is prior knowledge of the source signals: limitations on the waveforms possible for each source can disambiguate the mixture since there may only be one set of 'legal' source signals that would add together to give the observed mixture. A particularly attractive aspect of this approach is that models themselves may be learned from the environment by generalizing across observations of isolated sources. This chapter examines the general idea of using models in source separation including looking at what form these models can take and how they can be acquired, and describes examples of several systems which can be described within this framework.
Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation
"... Abstract—This paper presents a new approximate Bayesian estimator for enhancing a noisy speech signal. The speech model is assumed to be a Gaussian mixture model (GMM) in the logspectral domain. This is in contrast to most current models in frequency domain. Exact signal estimation is a computation ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract—This paper presents a new approximate Bayesian estimator for enhancing a noisy speech signal. The speech model is assumed to be a Gaussian mixture model (GMM) in the logspectral domain. This is in contrast to most current models in frequency domain. Exact signal estimation is a computationally intractable problem. We derive three approximations to enhance the efficiency of signal estimation. The Gaussian approximation transforms the logspectral domain GMM into the frequency domain using minimal Kullback–Leiber (KL)divergency criterion. The frequency domain Laplace method computes the maximum a posteriori (MAP) estimator for the spectral amplitude. Correspondingly, the logspectral domain Laplace method computes the MAP estimator for the logspectral amplitude. Further, the gain and noise spectrum adaptation are implemented using the expectation–maximization (EM) algorithm within the GMM under Gaussian approximation. The proposed algorithms are evaluated by applying them to enhance the speeches corrupted by the speechshaped noise (SSN). The experimental results demonstrate that the proposed algorithms offer improved signaltonoise ratio, lower word recognition error rate, and less spectral distortion.
AudioVisual Graphical Models for Speech Processing
"... Perceiving sounds in a noisy environment is a challenging problem. Visual lipreading can provide relevant information but is also challenging because lips are moving and a tracker must deal with a variety of conditions. Typically audiovisual systems have been assembled from individually engineered ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Perceiving sounds in a noisy environment is a challenging problem. Visual lipreading can provide relevant information but is also challenging because lips are moving and a tracker must deal with a variety of conditions. Typically audiovisual systems have been assembled from individually engineered modules. We propose to fuse audio and video in a probabilistic generative model that implements crossmodel selfsupervised learning, enabling adaptation to audiovisual data. The video model features a Gaussian mixture model embedded in a linear subspace of a sprite which translates in the video. The system can learn to detect and enhance speech in noise given only a short (30 second) sequence of audiovisual data. We show some results for speech detection and enhancement, and discuss extensions to the model that are under investigation.
Nonnegative sourcefilter dynamical system for speech enhancement
, 2014
"... Modelbased speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, model based methods have to focus on developi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Modelbased speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, model based methods have to focus on developing good speech models, whose quality will be key to their performance. In this study, we propose a novel probabilistic model for speech enhancement which precisely models the speech by taking into account the underlying speech production process as well as its dynamics. The proposed model follows a sourcefilter approach where the excitation and filter parts are modeled as nonnegative dynamical systems. We present convergenceguaranteed update rules for each latent factor. In order to assess performance, we evaluate our model on a challenging speech enhancement task where the speech is observed under nonstationary noises recorded in a car. We show that our model outperforms stateoftheart methods in terms of objective measures.
Speech Enhancement by Indirect VTS
, 2012
"... Modelbased speech enhancement methods, such as vectorTaylor seriesbased methods (VTS), share a common methodology: they estimate speech using the expected value of the clean speech given the noisy speech under a statistical model. We show that it may be better to use the expected value of the noi ..."
Abstract
 Add to MetaCart
(Show Context)
Modelbased speech enhancement methods, such as vectorTaylor seriesbased methods (VTS), share a common methodology: they estimate speech using the expected value of the clean speech given the noisy speech under a statistical model. We show that it may be better to use the expected value of the noise under the model and subtract it from the noisy observation to form an indirect estimate of the speech. Interestingly, for VTS, this methodology turns out to be related to the application of an SNRdependent gain to the direct VTS speech estimate. In results obtained on an automotive noise task, this methodology produces an average improvement of 1.6 dB signaltonoise ratio, relative to conventional methods.
Speech Enhancement Using Gaussian Scale Mixture Models
"... Abstract—This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the logspectra. The speech model in the logspectral domain is a Gaussian mixture mod ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the logspectra. The speech model in the logspectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zeromean Gaussian whose covariance equals to the exponential of the logspectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the logspectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and logspectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectationmaximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speechshaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signaltonoise ratio (SNR) and those reconstructed from the estimated logspectra produced lower word recognition error rate because the logspectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress. Index Terms—Gaussian scale mixture model (GSMM), Laplace method, speech enhancement, variational approximation. I.
unknown title
"... This paper introduces the Laplace algorithm for denoising in the cepstrum domain with applications to speech recognition. Our method uses Gaussian mixture priors for clean speech and noise cepstra and assumes that speech and noise mix linearly in the spectrum domain. The Laplace algorithm involves ..."
Abstract
 Add to MetaCart
(Show Context)
This paper introduces the Laplace algorithm for denoising in the cepstrum domain with applications to speech recognition. Our method uses Gaussian mixture priors for clean speech and noise cepstra and assumes that speech and noise mix linearly in the spectrum domain. The Laplace algorithm involves two steps (a) computing the posterior mode of the observed noisy cepstra and (b) Gaussian approximation of the posterior around the mode. We show that the Algonquin algorithm is a special case of our approach where a Newton method is used for (a). Interestingly, this observation also proves that the Algonquin algorithm does not converge in general. We propose the use of the BFGS method for (a) which also allows us to efficiently apply the Laplace algorithm in the cepstral domain. Denoising in the cepstral domain gives more than 31 % relative reduction in word error rate on average on the Aurora 2 task. 1.
unknown title
"... This paper introduces the Laplace algorithm for denoising in the cepstrum domain with applications to speech recognition. Our method uses Gaussian mixture priors for clean speech and noise cepstra and assumes that speech and noise mix linearly in the spectrum domain. The Laplace algorithm involves ..."
Abstract
 Add to MetaCart
(Show Context)
This paper introduces the Laplace algorithm for denoising in the cepstrum domain with applications to speech recognition. Our method uses Gaussian mixture priors for clean speech and noise cepstra and assumes that speech and noise mix linearly in the spectrum domain. The Laplace algorithm involves two steps (a) computing the posterior mode of the observed noisy cepstra and (b) Gaussian approximation of the posterior around the mode. We show that the Algonquin algorithm is a special case of our approach where a Newton method is used for (a). Interestingly, this observation also proves that the Algonquin algorithm does not converge in general. We propose the use of the BFGS method for (a) which also allows us to efficiently apply the Laplace algorithm in the cepstral domain. Denoising in the cepstral domain gives more than 31 % relative reduction in word error rate on average on the Aurora 2 task. 1.
Perceptual Inference in Generative Models Thesis Excerpts DRAFT
, 2004
"... Is the problem that we can’t see? Or is it that the problem is beautiful to me? —David C. Berman Everything we learn about the world arrives via the senses. We act in the ..."
Abstract
 Add to MetaCart
(Show Context)
Is the problem that we can’t see? Or is it that the problem is beautiful to me? —David C. Berman Everything we learn about the world arrives via the senses. We act in the