Results 1 -
8 of
8
Multichannel nonnegative tensor factorization with structured constraints for userguided audio source separation
- in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11
, 2011
"... Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
(Show Context)
Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate. This information may typically be retrieved from manual annotation. We use a so-called multichannel nonnegative tensor factorization (NTF) model, in which the original sources are observed through a multichannel convolutive mixture and in which the source power spectrograms are jointly modeled by a 3-valence (time/frequency/source) tensor. Our user-guided separation method produced competitive results at the 2010 Signal Separation Evaluation Campaign, with sufficient quality for real-world music editing applications. Index Terms — Audio source separation, user-guided, nonnegative tensor factorization, generalized expectation maximization.
Informed source separation: Underdetermined source signal recovery from an instantaneous stereo mixture
- in Proc. IEEE WASPAA
"... The present paper exposes a new technique that aims at solving an ill-posed source separation problem encountered in stereo mixtures. The proposed method is realized in an encoder-decoder framework: On the encoder side, a set of spectral envelopes is extracted from the original tracks, which are kno ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
(Show Context)
The present paper exposes a new technique that aims at solving an ill-posed source separation problem encountered in stereo mixtures. The proposed method is realized in an encoder-decoder framework: On the encoder side, a set of spectral envelopes is extracted from the original tracks, which are known. These envelopes are passed on to the decoder in attachment to the stereo mixture, whereas the frequency resolution of the former is adapted to the critical bands, and their magnitude is logarithmically quantized. On the decoder side, the mixture signal is decomposed by time-frequency selective iterative spatial filtering guided by a source activity index, which is derived from the spectral envelope values. A comparison with a similar algorithm reveals that the novel approach yields a higher perceptual audio quality at a much lower data rate. Index Terms — Inverse problem, informed source separation, stereo, spatial filtering, psychoacoustics 1.
Coding-based Informed Source Separation: Nonnegative Tensor Factorization Approach
, 2013
"... Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with th ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art. Index Terms—Informed source separation, spatial audio object coding, source coding, constrained entropy quantization, probabilistic model, nonnegative tensor factorization. I.
1 Learning Optimal Features for Polyphonic Audio-to-Score Alignment
"... Abstract—This paper addresses the design of feature functions for the matching of a musical recording to the symbolic representation of the piece (the score). These feature functions are defined as dissimilarity measures between the audio observations and template vectors corresponding to the score. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract—This paper addresses the design of feature functions for the matching of a musical recording to the symbolic representation of the piece (the score). These feature functions are defined as dissimilarity measures between the audio observations and template vectors corresponding to the score. By expressing the template construction as a linear mapping from the symbolic to the audio representation, one can learn the feature functions by optimizing the linear transformation. In this paper, we explore two different learning strategies. The first one uses a best-fit criterion (minimum divergence), while the second one exploits a discriminative framework based on a Conditional Random Fields model (maximum likelihood criterion). We evaluate the influence of the feature functions in an audioto-score alignment task, on a large database of popular and classical polyphonic music. The results show that with several types of models, using different temporal constraints, the learned mappings have the potential to outperform the classic heuristic mappings. Several representations of the audio observations, along with several distance functions are compared in this alignment task. Our experiments elect the symmetric Kullback-Leibler divergence. Moreover, both the spectrogram and a CQT-based representation turn out to provide very accurate alignments, detecting more than 97 % of the onsets with a precision of 100 ms with our most complex system. I.
Low bitrate informed source separation of realistic mixtures
- in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
, 2013
"... Demixing consists in recovering the sounds that compose a multi-channel mix. Important applications include karaoke or respatializa-tion. Several approaches to this problem have been proposed in a coding/decoding framework, which are denoted either as spatial au-dio object coding or informed source ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Demixing consists in recovering the sounds that compose a multi-channel mix. Important applications include karaoke or respatializa-tion. Several approaches to this problem have been proposed in a coding/decoding framework, which are denoted either as spatial au-dio object coding or informed source separation. They assume that the constituent sounds are available at an encoding stage and used to compute a side-information transmitted to the end-user. At a de-coding stage, only the mixtures and the side information are used to recover the sources. Here, we propose an advanced model, which encompasses many practical scenarios and permits to reach bitrates as low as 0.5kbps/source. First, the sources may be mono or multi-channel. Second, the mixing process is assumed to be diffuse, gener-alizing the usual linear-instantaneous or convolutive cases and per-mitting professional mixes to be processed. Third, the signals to be recovered may either be the original sources or their spatial images. Index Terms — audio upmixing, Wiener filtering, spatial audio object coding, informed source separation 1.
Signal Processing] (]]]])]]]–]]] Contents lists available at SciVerse ScienceDirect Signal Processing
"... journal homepage: www.elsevier.com/locate/sigpro Residual enhanced visual vector as a compact signature for mobile ..."
Abstract
- Add to MetaCart
journal homepage: www.elsevier.com/locate/sigpro Residual enhanced visual vector as a compact signature for mobile
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 1 Coding-based Informed Source Separation: Nonnegative Tensor Factorization Approach
"... Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with th ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art. Index Terms—Informed source separation, spatial audio object coding, source coding, constrained entropy quantization, probabilistic model, nonnegative tensor factorization. I.
Audio inpainting.
, 2013
"... High resolution NMF for modeling mixtures of non-stationary signals in the time-frequency domain NMF à haute résolution pour la modélisation de mélanges de signaux non-stationnaires dans le domaine temps-fréquence ..."
Abstract
- Add to MetaCart
High resolution NMF for modeling mixtures of non-stationary signals in the time-frequency domain NMF à haute résolution pour la modélisation de mélanges de signaux non-stationnaires dans le domaine temps-fréquence