## One microphone singing voice separation using source-adapted model, in "Proc. WASPAA (2005)

Citations: | 24 - 1 self |

### BibTeX

@MISC{Ozerov05onemicrophone,

author = {Alexey Ozerov and Rémi Gribonval and Frédéric Bimbot},

title = {One microphone singing voice separation using source-adapted model, in "Proc. WASPAA},

year = {2005}

}

### OpenURL

### Abstract

In this paper, the problem of one microphone source separation applied to singing voice extraction is studied. A probabilistic approach based on Gaussian Mixture Models (GMM) of the short time spectra of two sources is used. The question of source model adaptation is investigated in order to improve separation quality. A new adaptation method consisting in a filter adaptation technique via the Maximum Likelihood Linear Regression (MLLR) is presented with an associated filter-adapted training phase. 1.

### Citations

9054 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...rned by maximization of the likelihoods p( ¯ V |Σv) and p( ¯ M|Σm), given ¯ V and ¯ M the STFT of the training signals. This maximization is achieved using the Expectation Maximization (EM) algorithm =-=[6]-=- initialized by Vector Quantization (VQ). For example, in the case of voice model estimation, the observed data η = ¯ V is completed by the latent data θ = qv (states sequence), and the model paramete... |

137 | Space-Alternating Generalized Expectation-Maximization Algorithm
- Fessler, Hero
- 1994
(Show Context)
Citation Context ...stimated parameters ξ = {H, Σv} to solve the problem (13), since the maximization step is not easy to solve jointly on Σv and H. Instead a version of Space-Alternating Generalized EM (SAGE) algorithm =-=[8]-=- is used. The set of estimated parameters ξ is split in two parts ξ1 = H and ξ2 = Σv. The iteration number l +1of this algorithm consists in two EM algorithm iterations (4). The first iteration is app... |

118 | One microphone source separation
- Roweis
- 2000
(Show Context)
Citation Context ...adaptation technique via the Maximum Likelihood Linear Regression (MLLR) is presented with an associated filter-adapted training phase. 1. INTRODUCTION The problem of one microphone source separation =-=[1]-=- is a challenging task. In this paper, this problem is studied in the case of singing voice extraction from mono audio recordings. The approach is based on a priori probabilistic models for two source... |

55 |
A Bayesian Estimation Approach for Speech Enhancement Using Hidden Markov Models
- Ephraim
- 1992
(Show Context)
Citation Context ... and a music signal m(n) (called sources), where n is a discrete time index (x(n) =v(n) +m(n)). The aim is to estimate the voice contribution ˆv(n) in the observed signal x(n). For speech enhancement =-=[2]-=- and separation of several sources in a monophonic musical recording [3] it has been proposed to model the short time spectra of the sources by Gaussian Mixture Models (GMM). These models are learned ... |

50 |
Integrated models of signal and background with application to speaker identification in noise
- Rose, Hofstetter, et al.
- 1994
(Show Context)
Citation Context ...-adapted separation scheme. ◆ (Xt = Mt)t/∈voc (Xt)t∈voc, latent data θ = {qv,qm, (Vt)t∈voc} and model parameters ξ = Σv, leading in the case of our GMM models to the following re-estimation equations =-=[7]-=-: ω (l+1) vi [σ (l+1) vi (f)] 2 = |Vt(f)| 2 (l) ij = 1 Tvoc t∈voc E |Vt(f)| 2 = [σ(l) vi (f)]2σ 2 mj(f) [σ (l) vi (f)]2 + σ2 + mj (f) j (l) t∈voc j γ(l) ij (t) |Vt(f)|2 t∈voc γ (l) ij (t), (6) j γ(l) ... |

46 | Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation
- Gales, Pye, et al.
- 1996
(Show Context)
Citation Context ... a linear transformation of the feature space (short time spectra). In that case the transformation is a linear filter, which estimation (referred as filter adaptation) is based on the MLLR framework =-=[5]-=-. A filter-adapted training procedure for a general voice model is also presented. The paper is organized in the following way. The GMM-based one microphone source separation technique [2, 3] is descr... |

36 | Proposals for performance measurement in source separation
- Gribonval, Benaroya, et al.
- 2003
(Show Context)
Citation Context ...005, New Paltz, NY 5.2. Performance measure To measure the quality of the estimation ˆv with respect to the original singing voice v, we use the Source to Distortion Ratio (SDR) calculated as follows =-=[9]-=-: SDR(ˆv, v) = 10 log 10 ˆv, v 2 ˆv 2 v 2 − ˆv, v 2 (17) where ˆv, v is the scalar product of ˆv and v, v 2 is the energy of v. To evaluate the separation performance for one recording, the Normalized... |

15 | Wiener based source separation with HMM/GMM using a single sensor
- Benaroya, Bimbot
- 2003
(Show Context)
Citation Context ...dex (x(n) =v(n) +m(n)). The aim is to estimate the voice contribution ˆv(n) in the observed signal x(n). For speech enhancement [2] and separation of several sources in a monophonic musical recording =-=[3]-=- it has been proposed to model the short time spectra of the sources by Gaussian Mixture Models (GMM). These models are learned from training sources. The performance obtained with general models, i.e... |

9 | Blind clustering of popular music recordings based on singer voice characteristics
- Tsai, Wang, et al.
- 2003
(Show Context)
Citation Context ...(voice and music) should be modeled. It may be more efficient to use adapted models, i.e., models with characteristics close to those of the mixed sources. For blind clustering of popular music, Tsai =-=[4]-=- proposes to adapt music and voice models directly from the recording. In a first phase each recording is automatically segmented in a succession of vocal and non-vocal parts. Then, an adapted music m... |