Results 1 -
8 of
8
Multichannel nonnegative tensor factorization with structured constraints for userguided audio source separation
- in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11
, 2011
"... Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
(Show Context)
Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate. This information may typically be retrieved from manual annotation. We use a so-called multichannel nonnegative tensor factorization (NTF) model, in which the original sources are observed through a multichannel convolutive mixture and in which the source power spectrograms are jointly modeled by a 3-valence (time/frequency/source) tensor. Our user-guided separation method produced competitive results at the 2010 Signal Separation Evaluation Campaign, with sufficient quality for real-world music editing applications. Index Terms — Audio source separation, user-guided, nonnegative tensor factorization, generalized expectation maximization.
Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization
- in ICASSP
, 2011
"... Nonnegative matrix factorization (NMF) with the Itakura-Saito divergence has proven efficient for audio source separation and music transcription, where the signal power spectrogram is factored into a “dictionary ” matrix times an “activation” matrix. Given the nature of audio signals it is expected ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Nonnegative matrix factorization (NMF) with the Itakura-Saito divergence has proven efficient for audio source separation and music transcription, where the signal power spectrogram is factored into a “dictionary ” matrix times an “activation” matrix. Given the nature of audio signals it is expected that the activation coefficients exhibit smoothness along time frames. This may be enforced by penalizing the NMF objective function with an extra term reflecting smoothness of the activation coefficients. We propose a novel regularization term that solves some deficiencies of our previous work and leadstoanefficient implementation using a majorizationminimization procedure. Index Terms — Nonnegative matrix factorization (NMF), Itakura-Saito divergence, regularization by smoothness, audio signal representation, single-channel source separation. 1.
Coding-based Informed Source Separation: Nonnegative Tensor Factorization Approach
, 2013
"... Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with th ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art. Index Terms—Informed source separation, spatial audio object coding, source coding, constrained entropy quantization, probabilistic model, nonnegative tensor factorization. I.
Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers
, 2014
"... Abstract—Tensor factorization has proven useful in a wide range of applications, from sensor array processing to com-munications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, in-me ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Tensor factorization has proven useful in a wide range of applications, from sensor array processing to com-munications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, in-memory computation on a single machine; and the few that break away from this mold do not easily incorporate practically important constraints, such as non-negativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction Method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration; and it naturally leads to distributed algorithms suitable for parallel implementation. This opens the door for many emerging big data-enabled applications. The methodology is exemplified using non-negativity as a baseline constraint, but the proposed frame-work can incorporate many other types of constraints. Numerical experiments are encouraging, indicating that ADMoM-based non-negative tensor factorization (NTF) has high potential as an alternative to state-of-the-art approaches. Index Terms—Tensor decomposition, PARAFACmodel, parallel algorithms.
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 1 Coding-based Informed Source Separation: Nonnegative Tensor Factorization Approach
"... Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with th ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art. Index Terms—Informed source separation, spatial audio object coding, source coding, constrained entropy quantization, probabilistic model, nonnegative tensor factorization. I.
KERNEL SPECTROGRAM MODELS FOR SOURCE SEPARATION
"... In this study, we introduce a new framework called Kernel Additive Modelling for audio spectrograms that can be used for multichannel source separation. It assumes that the spectrogram of a source at any time-frequency bin is close to its value in a neighbourhood indicated by a source-specific proxi ..."
Abstract
- Add to MetaCart
In this study, we introduce a new framework called Kernel Additive Modelling for audio spectrograms that can be used for multichannel source separation. It assumes that the spectrogram of a source at any time-frequency bin is close to its value in a neighbourhood indicated by a source-specific proximity kernel. The rationale for this model is to easily account for features like periodicity, stability over time or frequency, self-similarity, etc. In many cases, such local dynamics are indeed much more natural to assess than any global model such as a tensor factorization. This framework permits one to use different proximity kernels for different sources and to estimate them blindly using their mixtures only. Estimation is performed using a variant of the kernel backfitting algorithm that allows for multichannel mixtures and permits parallelization. Experimental results on the separation of vocals from musical backgrounds demonstrate the efficiency of the approach. Index Terms—audio source separation, spatial filtering, spec-trogram models I.