Multichannel nonnegative tensor factorization with structured constraints for userguided audio source separation
 in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11
, 2011
"... Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a userguided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate ..."
Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a userguided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate. This information may typically be retrieved from manual annotation. We use a socalled multichannel nonnegative tensor factorization (NTF) model, in which the original sources are observed through a multichannel convolutive mixture and in which the source power spectrograms are jointly modeled by a 3valence (time/frequency/source) tensor. Our userguided separation method produced competitive results at the 2010 Signal Separation Evaluation Campaign, with sufficient quality for realworld music editing applications. Index Terms — Audio source separation, userguided, nonnegative tensor factorization, generalized expectation maximization.
Majorizationminimization algorithm for smooth ItakuraSaito nonnegative matrix factorization
 in ICASSP
, 2011
"... Nonnegative matrix factorization (NMF) with the ItakuraSaito divergence has proven efficient for audio source separation and music transcription, where the signal power spectrogram is factored into a “dictionary ” matrix times an “activation” matrix. Given the nature of audio signals it is expected ..."
Nonnegative matrix factorization (NMF) with the ItakuraSaito divergence has proven efficient for audio source separation and music transcription, where the signal power spectrogram is factored into a “dictionary ” matrix times an “activation” matrix. Given the nature of audio signals it is expected that the activation coefficients exhibit smoothness along time frames. This may be enforced by penalizing the NMF objective function with an extra term reflecting smoothness of the activation coefficients. We propose a novel regularization term that solves some deficiencies of our previous work and leadstoanefficient implementation using a majorizationminimization procedure. Index Terms — Nonnegative matrix factorization (NMF), ItakuraSaito divergence, regularization by smoothness, audio signal representation, singlechannel source separation. 1.
Codingbased Informed Source Separation: Nonnegative Tensor Factorization Approach
, 2013
"... Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with th ..."
Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and sideinformation are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Codingbased ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report ratedistortion results that strongly outperform the state of the art. Index Terms—Informed source separation, spatial audio object coding, source coding, constrained entropy quantization, probabilistic model, nonnegative tensor factorization. I.
Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers
, 2014
"... Abstract—Tensor factorization has proven useful in a wide range of applications, from sensor array processing to communications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, inme ..."
Abstract—Tensor factorization has proven useful in a wide range of applications, from sensor array processing to communications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, inmemory computation on a single machine; and the few that break away from this mold do not easily incorporate practically important constraints, such as nonnegativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction Method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration; and it naturally leads to distributed algorithms suitable for parallel implementation. This opens the door for many emerging big dataenabled applications. The methodology is exemplified using nonnegativity as a baseline constraint, but the proposed framework can incorporate many other types of constraints. Numerical experiments are encouraging, indicating that ADMoMbased nonnegative tensor factorization (NTF) has high potential as an alternative to stateoftheart approaches. Index Terms—Tensor decomposition, PARAFACmodel, parallel algorithms.
KERNEL SPECTROGRAM MODELS FOR SOURCE SEPARATION
"... In this study, we introduce a new framework called Kernel Additive Modelling for audio spectrograms that can be used for multichannel source separation. It assumes that the spectrogram of a source at any timefrequency bin is close to its value in a neighbourhood indicated by a sourcespecific proxi ..."
In this study, we introduce a new framework called Kernel Additive Modelling for audio spectrograms that can be used for multichannel source separation. It assumes that the spectrogram of a source at any timefrequency bin is close to its value in a neighbourhood indicated by a sourcespecific proximity kernel. The rationale for this model is to easily account for features like periodicity, stability over time or frequency, selfsimilarity, etc. In many cases, such local dynamics are indeed much more natural to assess than any global model such as a tensor factorization. This framework permits one to use different proximity kernels for different sources and to estimate them blindly using their mixtures only. Estimation is performed using a variant of the kernel backfitting algorithm that allows for multichannel mixtures and permits parallelization. Experimental results on the separation of vocals from musical backgrounds demonstrate the efficiency of the approach. Index Terms—audio source separation, spatial filtering, spectrogram models I.