Results 1  10
of
79
A general flexible framework for the handling of prior information in audio source separation
 IEEE Transactions on Audio, Speech and Signal Processing
, 2012
"... Abstract—Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general audio source separation framework based on a library of str ..."
Abstract

Cited by 45 (17 self)
 Add to MetaCart
(Show Context)
Abstract—Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general audio source separation framework based on a library of structured source models that enable the incorporation of prior knowledge about each source via userspecifiable constraints. While this framework generalizes several existing audio source separation methods, it also allows to imagine and implement new efficient methods that were not yet reported in the literature. We first introduce the framework by describing the model structure and constraints, explaining its generality, and summarizing its algorithmic implementation using a generalized expectationmaximization algorithm. Finally, we illustrate the abovementioned capabilities of the framework by applying it in several new and existing configurations to different source separation problems. We have released a software tool named Flexible Audio Source Separation Toolbox (FASST) implementing a baseline version of the framework in Matlab. Index Terms—Audio source separation, local Gaussian model, nonnegative matrix factorization, expectationmaximization I.
A robust method to count and locate audio sources in a stereophonic linear instantaneous mixture
 in ICA
, 2006
"... Abstract—We propose a method to count and estimate the mixing directions in an underdetermined multichannel mixture. The approach is based on the hypothesis that in the neighbourhood of some timefrequency points, only one source essentially contributes to the mixture: such timefrequency points can ..."
Abstract

Cited by 41 (10 self)
 Add to MetaCart
Abstract—We propose a method to count and estimate the mixing directions in an underdetermined multichannel mixture. The approach is based on the hypothesis that in the neighbourhood of some timefrequency points, only one source essentially contributes to the mixture: such timefrequency points can provide robust local estimates of the corresponding source direction. At the core of our contribution is a statistical model to exploit a local confidence measure which detects the timefrequency regions where such robust information is available. A clustering algorithm called DEMIX is proposed to merge the information from all timefrequency regions according to their confidence level. So as to estimate the delays of anechoic mixtures and overcome the intrinsic ambiguities of phase unwrapping as met with DUET, we propose a technique similar to GCCPHAT which is able to estimate delays that can largely exceed one sample. We propose an extensive experimental study which shows that the resulting method is more robust in conditions where all DUETlike comparable methods fail, that is in particular: a) when timedelays largely exceed one sample; b) when the source directions are very close. Index Terms—Blind source separation, multichannel audio, delay estimation, sparse component analysis, direction of arrival I.
The 2008 signal separation evaluation campaign: A communitybased approach to largescale evaluation
 in ICA, 2009
"... Abstract. This paper introduces the first communitybased Signal Separation Evaluation Campaign (SiSEC 2008), coordinated by the authors. This initiative aims to evaluate source separation systems following specifications agreed between the entrants. Four speech and music datasets were contributed, ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
(Show Context)
Abstract. This paper introduces the first communitybased Signal Separation Evaluation Campaign (SiSEC 2008), coordinated by the authors. This initiative aims to evaluate source separation systems following specifications agreed between the entrants. Four speech and music datasets were contributed, including synthetic mixtures as well as microphone recordings and professional mixtures. The source separation problem was split into four tasks, each evaluated via different objective performance criteria. We provide an overview of these datasets, tasks and criteria, summarize the results achieved by the submitted systems and discuss organization strategies for future campaigns. 1
Probabilistic modeling paradigms for audio source separation
 In Machine Audition: Principles, Algorithms and Systems. IGI Global
, 2010
"... Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation system ..."
Abstract

Cited by 25 (14 self)
 Add to MetaCart
(Show Context)
Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, we focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local timefrequency models and spectral templatebased models. We show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. We compare the merits of either paradigm and report objective performance figures. We conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future stateoftheart systems.
NONNEGATIVE MATRIX FACTORIZATION AND SPATIAL COVARIANCE MODEL FOR UNDERDETERMINED REVERBERANT AUDIO SOURCE SEPARATION
"... We address the problem of blind audio source separation in the underdetermined and convolutive case. The contribution of each source to the mixture channels in the timefrequency domain is modeled by a zeromean Gaussian random vector with a full rank covariance matrix composed of two terms: a vari ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
(Show Context)
We address the problem of blind audio source separation in the underdetermined and convolutive case. The contribution of each source to the mixture channels in the timefrequency domain is modeled by a zeromean Gaussian random vector with a full rank covariance matrix composed of two terms: a variance which represents the spectral properties of the source and which is modeled by a nonnegative matrix factorization (NMF) model and another full rank covariance matrix which encodes the spatial properties of the source contribution in the mixture. We address the estimation of these parameters by maximizing the likelihood of the mixture using an expectationmaximization (EM) algorithm. Theoretical propositions are corroborated by experimental studies on stereo reverberant music mixtures. 1.
The signal separation evaluation campaign (2007–2010): Achievements and remaining challenges
 Signal Processing
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Multichannel nonnegative tensor factorization with structured constraints for userguided audio source separation
 in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11
, 2011
"... Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a userguided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
(Show Context)
Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a userguided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate. This information may typically be retrieved from manual annotation. We use a socalled multichannel nonnegative tensor factorization (NTF) model, in which the original sources are observed through a multichannel convolutive mixture and in which the source power spectrograms are jointly modeled by a 3valence (time/frequency/source) tensor. Our userguided separation method produced competitive results at the 2010 Signal Separation Evaluation Campaign, with sufficient quality for realworld music editing applications. Index Terms — Audio source separation, userguided, nonnegative tensor factorization, generalized expectation maximization.
MAIN INSTRUMENT SEPARATION FROM STEREOPHONIC AUDIO SIGNALS USING A SOURCE/FILTER MODEL
"... We propose a new approach to solo/accompaniment separation from stereophonic music recordings which extends a monophonic algorithm we recently proposed. The solo part is modelled using a source/filter model to which we added two contributions: an explicit smoothing strategy for the filter frequency ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
We propose a new approach to solo/accompaniment separation from stereophonic music recordings which extends a monophonic algorithm we recently proposed. The solo part is modelled using a source/filter model to which we added two contributions: an explicit smoothing strategy for the filter frequency responses and an unvoicing model to catch the stochastic parts of the solo voice. The accompaniment is modelled as a general instantaneous mixture of several components leading to a Nonnegative Matrix Factorization framework. The stereophonic signal is assumed to be the instantaneous mixture of the solo and accompaniment contributions. Both channels are then jointly used within a Maximum Likelihood framework to estimate all the parameters. Three rounds of parameter estimations are necessary to sequentially estimate the melody, the voiced part and at last the unvoiced part of the solo. Our tests show that there is a clear improvement from a monophonic reference system to the proposed stereophonic system, especially when including the unvoicing model. The smoothness of the filters does not provide the desired improvement in solo/accompaniment separation, but may be useful in future applications such as lyrics recognition. At last, our submissions to the Signal Separation Evaluation Campaign (SiSEC), for the “Professionally Produced Music Recordings ” task, obtained very good results.
Notes on nonnegative tensor factorization of the spectrogram for audio source separation : statistical insights and towards selfclustering of the spatial cues
 in 7th International Symposium on Computer Music Modeling and Retrieval (CMMR
, 2010
"... Abstract. Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpointsource model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpointsource assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.
Informed source separation using latent components
 in Proc. LVA/ICA, 2010
"... Abstract. We address the issue of source separation in a particular informed configuration where both the sources and the mixtures are assumed to be known during a socalled encoding stage. This knowledge enables the computation of a side information which ought to be small enough to be watermarked ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We address the issue of source separation in a particular informed configuration where both the sources and the mixtures are assumed to be known during a socalled encoding stage. This knowledge enables the computation of a side information which ought to be small enough to be watermarked in the mixtures. At the decoding stage, the sources are no longer assumed to be known, only the mixtures and the side information are processed to perform source separation. The proposed method models the sources jointly using latent variables in a framework close to multichannel nonnegative matrix factorization and models the mixing process as linear filtering. Separation at the decoding stage is done using generalized Wiener filtering of the mixtures. An experimental setup shows that the method gives very satisfying results with mixtures composed of many sources. A study of its performance with respect to the number of latent variables is presented. 1