• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Notes on nonnegative tensor factorization of the spectrogram for audio source separation : statistical insights and towards self-clustering of the spatial cues (2010)

by C Févotte, A Ozerov
Venue:in 7th International Symposium on Computer Music Modeling and Retrieval (CMMR
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Multichannel nonnegative tensor factorization with structured constraints for userguided audio source separation

by Alexey Ozerov, Cédric Févotte, Raphaël Blouet, Jean-louis Durrieu - in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11 , 2011
"... Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate ..."
Abstract - Cited by 12 (6 self) - Add to MetaCart
Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate. This information may typically be retrieved from manual annotation. We use a so-called multichannel nonnegative tensor factorization (NTF) model, in which the original sources are observed through a multichannel convolutive mixture and in which the source power spectrograms are jointly modeled by a 3-valence (time/frequency/source) tensor. Our user-guided separation method produced competitive results at the 2010 Signal Separation Evaluation Campaign, with sufficient quality for real-world music editing applications. Index Terms — Audio source separation, user-guided, nonnegative tensor factorization, generalized expectation maximization.
(Show Context)

Citation Context

...] we assumed separate NMF model for each of the sources, equivalent to assuming only one nonzero coefficient per row of Q while this assumption is now relaxed. The interested reader may also refer to =-=[9]-=- for related discussions. Let us also mention that our setting is different from [3] which considers a simpler “informed” source separation application in which the parameters Q, W and H are learnt fr...

Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization

by Cédric Févotte - in ICASSP , 2011
"... Nonnegative matrix factorization (NMF) with the Itakura-Saito divergence has proven efficient for audio source separation and music transcription, where the signal power spectrogram is factored into a “dictionary ” matrix times an “activation” matrix. Given the nature of audio signals it is expected ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Nonnegative matrix factorization (NMF) with the Itakura-Saito divergence has proven efficient for audio source separation and music transcription, where the signal power spectrogram is factored into a “dictionary ” matrix times an “activation” matrix. Given the nature of audio signals it is expected that the activation coefficients exhibit smoothness along time frames. This may be enforced by penalizing the NMF objective function with an extra term reflecting smoothness of the activation coefficients. We propose a novel regularization term that solves some deficiencies of our previous work and leadstoanefficient implementation using a majorizationminimization procedure. Index Terms — Nonnegative matrix factorization (NMF), Itakura-Saito divergence, regularization by smoothness, audio signal representation, single-channel source separation. 1.
(Show Context)

Citation Context

... shown to produce satisfying results for music editing tasks. As for perspective we intend to incorporate our algorithm to nonnegative tensor factorization settings for multichannel source separation =-=[11]-=-. This will allow thorough evaluation of the system on a specific task (for which standard test data and evaluation criteria exist) and will in particular allow to quantify the influence of λ on the r...

Coding-based Informed Source Separation: Nonnegative Tensor Factorization Approach

by Alexey Ozerov, Antoine Liutkus, Senior Member, Gaël Richard, Senior Member , 2013
"... Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with th ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art. Index Terms—Informed source separation, spatial audio object coding, source coding, constrained entropy quantization, probabilistic model, nonnegative tensor factorization. I.
(Show Context)

Citation Context

...is defined by (5) and dIS(x|y) = x/y−log(x/y)− 1 is the Itakura-Saito (IS) divergence. The optimization of criterion (17) can be achieved by iterating the following multiplicative updates [10], [14], =-=[37]-=-: qjk ← qjk wfk ← wfk hnk ← hnk (∑ (∑ (∑ −2 f,n wfkhnkpjfnv jfn ∑ −1 f,n wfkhnkvjfn −2 j,n hnkqjkpjfnv jfn ∑ −1 j,n hnkqjkv jfn −2 j,f wfkqjkpjfnv jfn ∑ −1 j,f wfkqjkv jfn ) ) ) , (18) , (19) . (20) 2...

Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers

by Athanasios P. Liavas, Nicholas D. Sidiropoulos , 2014
"... Abstract—Tensor factorization has proven useful in a wide range of applications, from sensor array processing to com-munications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, in-me ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract—Tensor factorization has proven useful in a wide range of applications, from sensor array processing to com-munications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, in-memory computation on a single machine; and the few that break away from this mold do not easily incorporate practically important constraints, such as non-negativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction Method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration; and it naturally leads to distributed algorithms suitable for parallel implementation. This opens the door for many emerging big data-enabled applications. The methodology is exemplified using non-negativity as a baseline constraint, but the proposed frame-work can incorporate many other types of constraints. Numerical experiments are encouraging, indicating that ADMoM-based non-negative tensor factorization (NTF) has high potential as an alternative to state-of-the-art approaches. Index Terms—Tensor decomposition, PARAFACmodel, parallel algorithms.
(Show Context)

Citation Context

...on1 has proven useful in a wide rangeof signal processing applications, such as direction of arrival estimation [2], communication signal intelligence [3], and speech and audio signal separation [4], =-=[5]-=-, as well as cross-disciplinary areas, such as community detection in social networks Manuscript received August 30, 2014; revised January 21, 2015, May 05, 2015, and June 17, 2015; accepted June 23, ...

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 1 Coding-based Informed Source Separation: Nonnegative Tensor Factorization Approach

by Alexey Ozerov, Antoine Liutkus, Senior Member, Gaël Richard, Senior Member
"... Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with th ..."
Abstract - Add to MetaCart
Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art. Index Terms—Informed source separation, spatial audio object coding, source coding, constrained entropy quantization, probabilistic model, nonnegative tensor factorization. I.
(Show Context)

Citation Context

...is defined by (5) and dIS(x|y) = x/y−log(x/y)− 1 is the Itakura-Saito (IS) divergence. The optimization of criterion (17) can be achieved by iterating the following multiplicative updates [10], [14], =-=[37]-=-: qjk ← qjk wfk ← wfk hnk ← hnk (∑ (∑ (∑ −2 f,n wfkhnkpjfnv jfn ∑ −1 f,n wfkhnkvjfn −2 j,n hnkqjkpjfnv jfn ∑ −1 j,n hnkqjkv jfn −2 j,f wfkqjkpjfnv jfn ∑ −1 j,f wfkqjkv jfn ) ) ) , (18) , (19) . (20) 2...

learning of

by Olivier Mangin, David Filliat
"... A bag-of-features framework for incremental ..."
Abstract - Add to MetaCart
A bag-of-features framework for incremental

unknown title

by Rennes Bretagne-atlantique , 2010
"... c t i v i t y te p o r ..."
Abstract - Add to MetaCart
c t i v i t y te p o r
(Show Context)

Citation Context

...ring. This framework makes it possible to combine a range of existing spectral and spatial source models as well as to design novel advanced models, whose potential was evaluated in [42], [70], [50], =-=[61]-=-, [69]. In addition, we showed the benefit of using the empirical mixture covariance and an auditory-motivated frequency scale as the input representation [56]. 6.5.2. Improved spatial models for reve...

KERNEL SPECTROGRAM MODELS FOR SOURCE SEPARATION

by unknown authors
"... In this study, we introduce a new framework called Kernel Additive Modelling for audio spectrograms that can be used for multichannel source separation. It assumes that the spectrogram of a source at any time-frequency bin is close to its value in a neighbourhood indicated by a source-specific proxi ..."
Abstract - Add to MetaCart
In this study, we introduce a new framework called Kernel Additive Modelling for audio spectrograms that can be used for multichannel source separation. It assumes that the spectrogram of a source at any time-frequency bin is close to its value in a neighbourhood indicated by a source-specific proximity kernel. The rationale for this model is to easily account for features like periodicity, stability over time or frequency, self-similarity, etc. In many cases, such local dynamics are indeed much more natural to assess than any global model such as a tensor factorization. This framework permits one to use different proximity kernels for different sources and to estimate them blindly using their mixtures only. Estimation is performed using a variant of the kernel backfitting algorithm that allows for multichannel mixtures and permits parallelization. Experimental results on the separation of vocals from musical backgrounds demonstrate the efficiency of the approach. Index Terms—audio source separation, spatial filtering, spec-trogram models I.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University