Results 11 - 20
of
79
A general modular framework for audio source separation
- in "Proc. 9th Int. Conf. on Latent Variable Analysis and Signal Separation (LVA/ICA
"... Abstract. Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general modular audio source separation framework based on a libr ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
(Show Context)
Abstract. Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general modular audio source separation framework based on a library of flexible source models that enable the incorporation of prior knowledge about the characteristics of each source. First, this framework generalizes several existing audio source separation methods, while bringing a common formulation for them. Second, it allows to imagine and implement new efficient methods that were not yet reported in the literature. We first introduce the framework by describing the flexible model, explaining its generality, and summarizing our modular implementation using a Generalized Expectation-Maximization algorithm. Finally, we illustrate the above-mentioned capabilities of the framework by applying it in several new and existing configurations to different source separation scenarios.
Active-set Newton algorithm for overcomplete non-negative representations of audio
- IEEE Transactions on Audio, Speech, and Language Processing
, 2013
"... Abstract—This paper proposes a computationally efficient algorithm for estimating the non-negative weights of linear combinations of the atoms of large-scale audio dictionaries, so that the generalized Kullback-Leibler divergence between an audio observation and the model is minimized. This linear m ..."
Abstract
-
Cited by 8 (8 self)
- Add to MetaCart
(Show Context)
Abstract—This paper proposes a computationally efficient algorithm for estimating the non-negative weights of linear combinations of the atoms of large-scale audio dictionaries, so that the generalized Kullback-Leibler divergence between an audio observation and the model is minimized. This linear model has been found useful in many audio signal processing tasks, but the existing algorithms are computationally slow when a large number of atoms is used. The proposed algorithm is based on iteratively updating a set of active atoms, with the weights updated using the Newton method and the step size estimated such that the weights remain non-negative. Algorithm convergence evaluations on representing audio spec-tra that are mixtures of two speakers show that with all the tested dictionary sizes the proposed method reaches a much lower value of the divergence than can be obtained by conventional algorithms, and is up to 8 times faster. A source separation separation evaluation revealed that when using large dictionaries, the proposed method produces a better separation separation quality in less time. Index Terms—acoustic signal analysis, audio source separation, supervised source separation, non-negative matrix factorization, Newton algorithm, convex optimization, sparse coding, sparse representation
New formulations and efficient algorithms for multichannel NMF
- in Proc. WASPAA ’11
, 2011
"... This paper proposes new formulations and algorithms for a multi-channel extension of nonnegative matrix factorization (NMF), in-tending convolutive sound source separation with multiple micro-phones. The proposed formulation employs Hermitian positive semidefinite matrices to represent a multichanne ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
This paper proposes new formulations and algorithms for a multi-channel extension of nonnegative matrix factorization (NMF), in-tending convolutive sound source separation with multiple micro-phones. The proposed formulation employs Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Such matrices are basically estimated for NMF bases, but a source separation task can be performed by introducing variables that relate NMF bases and sources. Efficient optimiza-tion algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Experimental results show that two instrumental sounds coming from different directions were successfully separated by the proposed algorithm. Index Terms — nonnegative matrix factorization, multichan-nel, positive semidefinite, auxiliary function, source separation 1.
Coding-based Informed Source Separation: Nonnegative Tensor Factorization Approach
, 2013
"... Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with th ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a sideinformation may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art. Index Terms—Informed source separation, spatial audio object coding, source coding, constrained entropy quantization, probabilistic model, nonnegative tensor factorization. I.
Multichannel audio upmixing based on non-negative tensor factorization representation
- In IEEE Workshop Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz
, 2011
"... This paper proposes a new spatial audio coding (SAC) method that is based on parametrization of multichannel audio by sound objects using non-negative tensor factorization (NTF). The spatial param-eters are estimated using perceptually motivated NTF model and are used for upmixing a downmixed and en ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
This paper proposes a new spatial audio coding (SAC) method that is based on parametrization of multichannel audio by sound objects using non-negative tensor factorization (NTF). The spatial param-eters are estimated using perceptually motivated NTF model and are used for upmixing a downmixed and encoded mixture signal. The performance of the proposed coding is evaluated using listen-ing tests, which prove the coding performance being on a par with conventional SAC methods. Additionally the proposed coding en-ables controlling the upmix content by meaningful objects. Index Terms — Spatial audio coding, Object-based audio cod-ing, Non-negative tensor factorization
R.: Under-determined convolutive blind source separation using spatial covariance models
- In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP
, 2010
"... This paper deals with the problem of under-determined convolutive blind source separation. We model the contribution of each source to all mixture channels in the time-frequency domain as a zero-mean Gaussian random variable whose covariance encodes the spatial properties of the source. We consider ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
This paper deals with the problem of under-determined convolutive blind source separation. We model the contribution of each source to all mixture channels in the time-frequency domain as a zero-mean Gaussian random variable whose covariance encodes the spatial properties of the source. We consider two covariance models and address the estimation of their parameters from the recorded mixture by a suitable initialization scheme followed by an iterative expectationmaximization (EM) procedure in each frequency bin. We then align the order of the estimated sources across all frequency bins based on their estimated directions of arrival (DOA). Experimental results over a stereo reverberant speech mixture show the effectiveness of the proposed approach. Index Terms — Convolutive blind source separation, under-determined mixtures, spatial covariance models, EM algorithm, permutation problem. 1.
Blind Separation of Quasi-Stationary Sources: Exploiting Convex Geometry in Covariance Domain
, 2015
"... This paper revisits blind source separation of instantaneously mixed quasi-stationary sources (BSS-QSS), motivated by the observation that in certain applications (e.g., speech) there exist time frames during which only one source is active, or locally dominant. Combined with nonnegativity of sourc ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper revisits blind source separation of instantaneously mixed quasi-stationary sources (BSS-QSS), motivated by the observation that in certain applications (e.g., speech) there exist time frames during which only one source is active, or locally dominant. Combined with nonnegativity of source powers, this endows the problem with a nice convex geometry that enables elegant and efficient BSS solutions. Local dominance is tantamount to the so-called pure pixel/separability assumption in hyperspectral unmixing/nonnegative matrix factorization, respectively. Building on this link, a very simple algorithm called successive projection algorithm (SPA) is considered for estimating the mixing system in closed form. To complement SPA in the specific BSS-QSS context, an algebraic preprocessing procedure is proposed to suppress short-term source cross-correlation interference. The proposed procedure is simple, effective, and supported by theoretical analysis. Solutions based on volume minimization (VolMin) are also considered. By theoretical analysis, it is shown that VolMin guarantees perfect mixing system identifiability under an assumption more relaxed than (exact) local dominance—which means wider applicability in practice. Exploiting the specific structure of BSS-QSS, a fast VolMin algorithm is proposed for the overdetermined case. Careful simulations using real speech sources showcase the simplicity, efficiency, and accuracy of the proposed algorithms.
PROFESSIONALLY-PRODUCED MUSIC SEPARATION GUIDED BY COVERS
"... This paper addresses the problem of demixing professionally produced music, i.e., recovering the musical source signals that compose a (2-channel stereo) commercial mix signal. Inspired by previous studies using MIDI synthesized or hummed signals as external references, we propose to use the multitr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of demixing professionally produced music, i.e., recovering the musical source signals that compose a (2-channel stereo) commercial mix signal. Inspired by previous studies using MIDI synthesized or hummed signals as external references, we propose to use the multitrack signals of a cover interpretation to guide the separation process with a relevant initialization. This process is carried out within the framework of the multichannel convolutive NMF model and associated EM/MU estimation algorithms. Although subject to the limitations of the convolutive assumption, our experiments confirm the potential of using multitrack cover signals for source separation of commercial music. 1.
Direction of arrival based spatial covariance model for blind sound source separation
- IEEE Transactions on Audio, Speech, and Language Processing
, 2014
"... Abstract—This paper addresses the problem of sound source separation from a multichannel microphone array capture via estimation of source spatial covariance matrix (SCM) of a short-time Fourier transformed mixture signal. In many conventional audio separation algorithms the source mixing parameter ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract—This paper addresses the problem of sound source separation from a multichannel microphone array capture via estimation of source spatial covariance matrix (SCM) of a short-time Fourier transformed mixture signal. In many conventional audio separation algorithms the source mixing parameter esti-mation is done separately for each frequency thus making them prone to errors and leading to suboptimal source estimates. In this paper we propose a SCM model which consists of a weighted sum of direction of arrival (DoA) kernels and estimate only the weights dependent on the source directions. In the proposed algorithm, the spatial properties of the sources become jointly optimized over all frequencies, leading to more coherent source estimates and mitigating the effect of spatial aliasing at high frequencies. The proposed SCM model is combined with a linear model for magnitudes and the parameter estimation is formulated in a complex-valued non-negative matrix factorization (CNMF) framework. Simulations consist of recordings done with a hand-held device sized array having multiple microphones embedded inside the device casing. Separation quality of the proposed algorithm is shown to exceed the performance of existing state of the art separation methods with two sources when evaluated by objective separation quality metrics. Index Terms—multichannel source separation, spatial covari-ance models, non-negative matrix factorization, direction of arrival estimation, array signal processing I.
INTERACTIVE REFINEMENT OF SUPERVISED AND SEMI-SUPERVISED SOUND SOURCE SEPARATION ESTIMATES
"... We propose an interactive refinement method for supervised and semi-supervised single-channel source separation. The refinement method allows end-users to provide feedback to the separation process by painting on spectrogram displays of intermediate output results. The time-frequency annotations are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
We propose an interactive refinement method for supervised and semi-supervised single-channel source separation. The refinement method allows end-users to provide feedback to the separation process by painting on spectrogram displays of intermediate output results. The time-frequency annotations are then used to update the separation estimates and iteratively refine the results. The initial separation is performed using probabilistic latent component analysis and is then extended to incorporate the painting annotations using linear grouping expectation constraints via the framework of posterior regu-larization. Using a prototype user-interface, we show that the method is able to perform high-quality separation with mini-mal user-interaction. Index Terms — source separation, probabilistic latent component analysis, user-interaction 1.