Results 1  10
of
40
C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation
 IEEE Trans. Audio, Speech, Language Process
, 2010
"... We consider inference in a general datadriven objectbased model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. Each source is given a model inspired from nonnegative matrix factorization (NMF) with the ItakuraSaito divergence, wh ..."
Abstract

Cited by 79 (17 self)
 Add to MetaCart
(Show Context)
We consider inference in a general datadriven objectbased model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. Each source is given a model inspired from nonnegative matrix factorization (NMF) with the ItakuraSaito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectationmaximization algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms were applied to stereo music and assessed in terms of blind source separation performance. Index Terms — Multichannel audio, nonnegative matrix factorization, nonnegative tensor factorization, underdetermined convolutive blind source separation. 1.
The 2008 signal separation evaluation campaign: A communitybased approach to largescale evaluation
 in ICA, 2009
"... Abstract. This paper introduces the first communitybased Signal Separation Evaluation Campaign (SiSEC 2008), coordinated by the authors. This initiative aims to evaluate source separation systems following specifications agreed between the entrants. Four speech and music datasets were contributed, ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
(Show Context)
Abstract. This paper introduces the first communitybased Signal Separation Evaluation Campaign (SiSEC 2008), coordinated by the authors. This initiative aims to evaluate source separation systems following specifications agreed between the entrants. Four speech and music datasets were contributed, including synthetic mixtures as well as microphone recordings and professional mixtures. The source separation problem was split into four tasks, each evaluated via different objective performance criteria. We provide an overview of these datasets, tasks and criteria, summarize the results achieved by the submitted systems and discuss organization strategies for future campaigns. 1
The signal separation evaluation campaign (2007–2010): Achievements and remaining challenges
 Signal Processing
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Consistent Wiener Filtering: Generalized TimeFrequency Masking Respecting Spectrogram Consistency
"... Abstract. Wiener filtering is one of the most widely used methods in audio source separation. It is often applied on timefrequency representations of signals, such as the shorttime Fourier transform (STFT), to exploit their shortterm stationarity, but so far the design of the Wiener timefrequenc ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Wiener filtering is one of the most widely used methods in audio source separation. It is often applied on timefrequency representations of signals, such as the shorttime Fourier transform (STFT), to exploit their shortterm stationarity, but so far the design of the Wiener timefrequency mask did not take into account the necessity for the output spectrograms to be consistent, i.e., to correspond to the STFT of a timedomain signal. In this paper, we generalize the concept of Wiener filtering to timefrequency masks which can involve manipulation of the phase as well by formulating the problem as a consistencyconstrained MaximumLikelihood one. We present two methods to solve the problem, one looking for the optimal timedomain signal, the other promoting consistency through a penalty function directly in the timefrequency domain. We show through experimental evaluation that, both in oracle conditions and combined with spectral subtraction, our method outperforms classical Wiener filtering.
Notes on nonnegative tensor factorization of the spectrogram for audio source separation : statistical insights and towards selfclustering of the spatial cues
 in 7th International Symposium on Computer Music Modeling and Retrieval (CMMR
, 2010
"... Abstract. Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpointsource model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpointsource assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.
Multichannel extensions of nonnegative matrix factorization with complexvalued data
 IEEE Transactions on Audio, Speech and Language Processing
, 2013
"... Abstract—This paper presents new formulations and algorithms for multichannel extensions of nonnegative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of nonnegative elements. Multichannel Euclidean distance and mult ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents new formulations and algorithms for multichannel extensions of nonnegative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of nonnegative elements. Multichannel Euclidean distance and multichannel ItakuraSaito (IS) divergence are defined based on appropriate statistical models utilizing multivariate complex Gaussian distributions. To minimize this distance/divergence, efficient optimization algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Two methods are proposed for clustering NMF bases according to the estimated spatial property. Convolutive blind source separation (BSS) is performed by the multichannel extensions of NMF with the clustering mechanism. Experimental results show that 1) the derived multiplicative update rules exhibited good convergence behavior, and 2) BSS tasks for several music sources with two microphones and three instrumental parts were evaluated successfully. Index Terms—Blind source separation, clustering, convolutive mixture, multichannel, nonnegative matrix factorization. I.
A general framework for online audio source separation. Latent Variable Analysis and Source Separation. Paper presented at
 the 10th International Conference on (LVA/ICA2012) (pp. 364–371).TelAviv
, 2012
"... ar ..."
(Show Context)
Modeling Perceptual Similarity of Audio Signals for Blind Source Separation Evaluation
"... Abstract. Existing perceptual models of audio quality, such as PEAQ, were designed to measure audio codec performance and are not well suited to evaluation of audio source separation algorithms. The relationship of many other signal quality measures to human perception is not well established. We co ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Existing perceptual models of audio quality, such as PEAQ, were designed to measure audio codec performance and are not well suited to evaluation of audio source separation algorithms. The relationship of many other signal quality measures to human perception is not well established. We collected subjective human assessments of distortions encountered when separating audio sources from mixtures of two to four harmonic sources. We then correlated these assessments to 18 machinemeasurable parameters. Results show a strong correlation (r=0.96) between a linear combination of a subset of four of these parameters and mean human assessments. This correlation is stronger than that between human assessments and several measures currently in use.
New formulations and efficient algorithms for multichannel NMF
 in Proc. WASPAA ’11
, 2011
"... This paper proposes new formulations and algorithms for a multichannel extension of nonnegative matrix factorization (NMF), intending convolutive sound source separation with multiple microphones. The proposed formulation employs Hermitian positive semidefinite matrices to represent a multichanne ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
This paper proposes new formulations and algorithms for a multichannel extension of nonnegative matrix factorization (NMF), intending convolutive sound source separation with multiple microphones. The proposed formulation employs Hermitian positive semidefinite matrices to represent a multichannel version of nonnegative elements. Such matrices are basically estimated for NMF bases, but a source separation task can be performed by introducing variables that relate NMF bases and sources. Efficient optimization algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Experimental results show that two instrumental sounds coming from different directions were successfully separated by the proposed algorithm. Index Terms — nonnegative matrix factorization, multichannel, positive semidefinite, auxiliary function, source separation 1.
A blockbased compressed sensing method for underdetermined blind speech separation incorporating binary mask
, 2010
"... A blockbased compressed sensing approach coupled with binary timefrequency masking is presented for the underdetermined speech separation problem. The proposed algorithm consists of multiple steps. First, the mixed signals are segmented to a number of blocks. For each block, the unknown mixing m ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
A blockbased compressed sensing approach coupled with binary timefrequency masking is presented for the underdetermined speech separation problem. The proposed algorithm consists of multiple steps. First, the mixed signals are segmented to a number of blocks. For each block, the unknown mixing matrix is estimated in the transform domain by a clustering algorithm. Using the estimated mixing matrix, the sources are recovered by a compressed sensing approach. The coarsely separated sources are then used to estimate the timefrequency binary masks which are further applied to enhance the separation performance. The separated source components from all the blocks are concatenated to reconstruct the whole signal. Numerical experiments are provided to show the improved separation performance of the proposed algorithm, as compared with two recent approaches. The blockbased operation has the advantage in improving considerably the computational efficiency of the compressed sensing algorithm without degrading its separation performance. Index Terms — Underdetermined blind source separation (BSS), sparse representation, compressed sensing (CS), blockbased processing, binary timefrequency mask 1.