Results 11  20
of
79
A general modular framework for audio source separation
 in "Proc. 9th Int. Conf. on Latent Variable Analysis and Signal Separation (LVA/ICA
"... Abstract. Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general modular audio source separation framework based on a libr ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general modular audio source separation framework based on a library of flexible source models that enable the incorporation of prior knowledge about the characteristics of each source. First, this framework generalizes several existing audio source separation methods, while bringing a common formulation for them. Second, it allows to imagine and implement new efficient methods that were not yet reported in the literature. We first introduce the framework by describing the flexible model, explaining its generality, and summarizing our modular implementation using a Generalized ExpectationMaximization algorithm. Finally, we illustrate the abovementioned capabilities of the framework by applying it in several new and existing configurations to different source separation scenarios.
Activeset Newton algorithm for overcomplete nonnegative representations of audio
 IEEE Transactions on Audio, Speech, and Language Processing
, 2013
"... Abstract—This paper proposes a computationally efficient algorithm for estimating the nonnegative weights of linear combinations of the atoms of largescale audio dictionaries, so that the generalized KullbackLeibler divergence between an audio observation and the model is minimized. This linear m ..."
Abstract

Cited by 8 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper proposes a computationally efficient algorithm for estimating the nonnegative weights of linear combinations of the atoms of largescale audio dictionaries, so that the generalized KullbackLeibler divergence between an audio observation and the model is minimized. This linear model has been found useful in many audio signal processing tasks, but the existing algorithms are computationally slow when a large number of atoms is used. The proposed algorithm is based on iteratively updating a set of active atoms, with the weights updated using the Newton method and the step size estimated such that the weights remain nonnegative. Algorithm convergence evaluations on representing audio spectra that are mixtures of two speakers show that with all the tested dictionary sizes the proposed method reaches a much lower value of the divergence than can be obtained by conventional algorithms, and is up to 8 times faster. A source separation separation evaluation revealed that when using large dictionaries, the proposed method produces a better separation separation quality in less time. Index Terms—acoustic signal analysis, audio source separation, supervised source separation, nonnegative matrix factorization, Newton algorithm, convex optimization, sparse coding, sparse representation
New formulations and efficient algorithms for multichannel NMF
 in Proc. WASPAA ’11
, 2011
"... This paper proposes new formulations and algorithms for a multichannel extension of nonnegative matrix factorization (NMF), intending convolutive sound source separation with multiple microphones. The proposed formulation employs Hermitian positive semidefinite matrices to represent a multichanne ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
This paper proposes new formulations and algorithms for a multichannel extension of nonnegative matrix factorization (NMF), intending convolutive sound source separation with multiple microphones. The proposed formulation employs Hermitian positive semidefinite matrices to represent a multichannel version of nonnegative elements. Such matrices are basically estimated for NMF bases, but a source separation task can be performed by introducing variables that relate NMF bases and sources. Efficient optimization algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Experimental results show that two instrumental sounds coming from different directions were successfully separated by the proposed algorithm. Index Terms — nonnegative matrix factorization, multichannel, positive semidefinite, auxiliary function, source separation 1.
Codingbased Informed Source Separation: Nonnegative Tensor Factorization Approach
, 2013
"... Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with th ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and sideinformation are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Codingbased ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report ratedistortion results that strongly outperform the state of the art. Index Terms—Informed source separation, spatial audio object coding, source coding, constrained entropy quantization, probabilistic model, nonnegative tensor factorization. I.
Multichannel audio upmixing based on nonnegative tensor factorization representation
 In IEEE Workshop Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz
, 2011
"... This paper proposes a new spatial audio coding (SAC) method that is based on parametrization of multichannel audio by sound objects using nonnegative tensor factorization (NTF). The spatial parameters are estimated using perceptually motivated NTF model and are used for upmixing a downmixed and en ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
This paper proposes a new spatial audio coding (SAC) method that is based on parametrization of multichannel audio by sound objects using nonnegative tensor factorization (NTF). The spatial parameters are estimated using perceptually motivated NTF model and are used for upmixing a downmixed and encoded mixture signal. The performance of the proposed coding is evaluated using listening tests, which prove the coding performance being on a par with conventional SAC methods. Additionally the proposed coding enables controlling the upmix content by meaningful objects. Index Terms — Spatial audio coding, Objectbased audio coding, Nonnegative tensor factorization
R.: Underdetermined convolutive blind source separation using spatial covariance models
 In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP
, 2010
"... This paper deals with the problem of underdetermined convolutive blind source separation. We model the contribution of each source to all mixture channels in the timefrequency domain as a zeromean Gaussian random variable whose covariance encodes the spatial properties of the source. We consider ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
This paper deals with the problem of underdetermined convolutive blind source separation. We model the contribution of each source to all mixture channels in the timefrequency domain as a zeromean Gaussian random variable whose covariance encodes the spatial properties of the source. We consider two covariance models and address the estimation of their parameters from the recorded mixture by a suitable initialization scheme followed by an iterative expectationmaximization (EM) procedure in each frequency bin. We then align the order of the estimated sources across all frequency bins based on their estimated directions of arrival (DOA). Experimental results over a stereo reverberant speech mixture show the effectiveness of the proposed approach. Index Terms — Convolutive blind source separation, underdetermined mixtures, spatial covariance models, EM algorithm, permutation problem. 1.
Blind Separation of QuasiStationary Sources: Exploiting Convex Geometry in Covariance Domain
, 2015
"... This paper revisits blind source separation of instantaneously mixed quasistationary sources (BSSQSS), motivated by the observation that in certain applications (e.g., speech) there exist time frames during which only one source is active, or locally dominant. Combined with nonnegativity of sourc ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
This paper revisits blind source separation of instantaneously mixed quasistationary sources (BSSQSS), motivated by the observation that in certain applications (e.g., speech) there exist time frames during which only one source is active, or locally dominant. Combined with nonnegativity of source powers, this endows the problem with a nice convex geometry that enables elegant and efficient BSS solutions. Local dominance is tantamount to the socalled pure pixel/separability assumption in hyperspectral unmixing/nonnegative matrix factorization, respectively. Building on this link, a very simple algorithm called successive projection algorithm (SPA) is considered for estimating the mixing system in closed form. To complement SPA in the specific BSSQSS context, an algebraic preprocessing procedure is proposed to suppress shortterm source crosscorrelation interference. The proposed procedure is simple, effective, and supported by theoretical analysis. Solutions based on volume minimization (VolMin) are also considered. By theoretical analysis, it is shown that VolMin guarantees perfect mixing system identifiability under an assumption more relaxed than (exact) local dominance—which means wider applicability in practice. Exploiting the specific structure of BSSQSS, a fast VolMin algorithm is proposed for the overdetermined case. Careful simulations using real speech sources showcase the simplicity, efficiency, and accuracy of the proposed algorithms.
PROFESSIONALLYPRODUCED MUSIC SEPARATION GUIDED BY COVERS
"... This paper addresses the problem of demixing professionally produced music, i.e., recovering the musical source signals that compose a (2channel stereo) commercial mix signal. Inspired by previous studies using MIDI synthesized or hummed signals as external references, we propose to use the multitr ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of demixing professionally produced music, i.e., recovering the musical source signals that compose a (2channel stereo) commercial mix signal. Inspired by previous studies using MIDI synthesized or hummed signals as external references, we propose to use the multitrack signals of a cover interpretation to guide the separation process with a relevant initialization. This process is carried out within the framework of the multichannel convolutive NMF model and associated EM/MU estimation algorithms. Although subject to the limitations of the convolutive assumption, our experiments confirm the potential of using multitrack cover signals for source separation of commercial music. 1.
Direction of arrival based spatial covariance model for blind sound source separation
 IEEE Transactions on Audio, Speech, and Language Processing
, 2014
"... Abstract—This paper addresses the problem of sound source separation from a multichannel microphone array capture via estimation of source spatial covariance matrix (SCM) of a shorttime Fourier transformed mixture signal. In many conventional audio separation algorithms the source mixing parameter ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract—This paper addresses the problem of sound source separation from a multichannel microphone array capture via estimation of source spatial covariance matrix (SCM) of a shorttime Fourier transformed mixture signal. In many conventional audio separation algorithms the source mixing parameter estimation is done separately for each frequency thus making them prone to errors and leading to suboptimal source estimates. In this paper we propose a SCM model which consists of a weighted sum of direction of arrival (DoA) kernels and estimate only the weights dependent on the source directions. In the proposed algorithm, the spatial properties of the sources become jointly optimized over all frequencies, leading to more coherent source estimates and mitigating the effect of spatial aliasing at high frequencies. The proposed SCM model is combined with a linear model for magnitudes and the parameter estimation is formulated in a complexvalued nonnegative matrix factorization (CNMF) framework. Simulations consist of recordings done with a handheld device sized array having multiple microphones embedded inside the device casing. Separation quality of the proposed algorithm is shown to exceed the performance of existing state of the art separation methods with two sources when evaluated by objective separation quality metrics. Index Terms—multichannel source separation, spatial covariance models, nonnegative matrix factorization, direction of arrival estimation, array signal processing I.
INTERACTIVE REFINEMENT OF SUPERVISED AND SEMISUPERVISED SOUND SOURCE SEPARATION ESTIMATES
"... We propose an interactive refinement method for supervised and semisupervised singlechannel source separation. The refinement method allows endusers to provide feedback to the separation process by painting on spectrogram displays of intermediate output results. The timefrequency annotations are ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We propose an interactive refinement method for supervised and semisupervised singlechannel source separation. The refinement method allows endusers to provide feedback to the separation process by painting on spectrogram displays of intermediate output results. The timefrequency annotations are then used to update the separation estimates and iteratively refine the results. The initial separation is performed using probabilistic latent component analysis and is then extended to incorporate the painting annotations using linear grouping expectation constraints via the framework of posterior regularization. Using a prototype userinterface, we show that the method is able to perform highquality separation with minimal userinteraction. Index Terms — source separation, probabilistic latent component analysis, userinteraction 1.