Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
 IEEE Trans. On Audio, Speech and Lang. Processing
, 2007
Abstract

Cited by 89 (10 self)
Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements. Index Terms—Acoustic signal analysis, audio source separation, blind source separation, music, nonnegative matrix factorization, sparse coding, unsupervised learning. I.
Sound Source Separation in Monaural Music Signals
, 2006
Abstract

Cited by 22 (3 self)
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, onechannel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of realworld sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses modelbased inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Nonnegative tensor factorisation for sound source separation
 IN: PROCEEDINGS OF IRISH SIGNALS AND SYSTEMS CONFERENCE
, 2005
Abstract

Cited by 15 (1 self)
... is introduced which extends current matrix factorisation techniques to deal with tensors. The effectiveness of the algorithm is then demonstrated through tests on synthetic data. The algorithm is then employed as a means of performing sound source separation on two channel mixtures, and the separation capabilities of the algorithm demonstrated on a two channel mixture containing saxophone, strings and bass guitar.
Sound Source Separation using Shifted Nonnegative Tensor Factorisation
 Proceedings on the IEE Conference on Audio and Speech Signal Processing (ICASSP
, 2006
Abstract

Cited by 13 (0 self)
Recently, shifted Nonnegative Matrix Factorisation was developed as a means of separating harmonic instruments from single channel mixtures. However, in many cases two or more channels are available, in which case it would be advantageous to have a multichannel version of the algorithm. To this end, a shifted Nonnegative Tensor Factorisation algorithm is derived, which extends shifted Nonnegative Matrix Factorisation to the multichannel case. The use of this algorithm for multichannel sound source separation of harmonic instruments is demonstrated. Further, it is shown that the algorithm can be used to perform Nonnegative Tensor Deconvolution, a multichannel version of Nonnegative Matrix Deconvolution, to separate sound sources which have time evolving spectra from multichannel signals. 1.
Some Case Studies in Automatic Descriptor Extraction
Abstract

Cited by 3 (3 self)
Abstract. This work aims to evaluate the effectiveness of EDS as a tool to automatically extract descriptors for realworld problems, such as melody extraction, chord recognition, and sound classification, comparing its performance and development time to traditional approaches. Each of these problems constitutes a case study, and along with the comparative results we present some remarks about the descriptor extraction procedure. 1.
ACOUSTIC MODELLING OF DRUM SOUNDS WITH HIDDEN MARKOV MODELS FOR MUSIC TRANSCRIPTION
Abstract

Cited by 3 (1 self)
This paper describes two methods for applying hidden Markov models (HMMs) to acoustic modelling of drum sound events for polyphonic music transcription. The proposed methods are instrumentwise binary modelling and modelling of instrument combinations. In the first, each target instrument is modelled with a “sound ” model and all target instruments share a “silence ” model. Each instrument is transcribed independently from the others. In the latter method, different instrument combinations are modelled, and an additional “silence ” model is created. The proposed methods are evaluated with simulations with acoustic data, and compared with two reference methods. Simulations show that combination modelling performs better than instrumentwise modelling. 1.
Monaural Sound Source Separation by Perceptually Weighted NonNegative Matrix Factorization
Abstract

Cited by 2 (0 self)
Abstract — A dataadaptive algorithm for the separation of sound sources from onechannel signals is presented. The algorithm applies weighted nonnegative matrix factorization on the power spectrogram of the input signal. Perceptually motivated weights for each critical band in each frame are used to model the loudness perception of the human auditory system. The method compresses highenergy components, and enables the estimation of perceptually significant lowenergy characteristics of sources. The power spectrogram is factorized into a sum of components which have a fixed magnitude spectrum with a timevarying gain. Each source consists of one or more components. The parameters of the components are estimated by minimizing the weighted divergence between the observed power spectrogram and the model, for which a weighted nonnegative matrix factorization algorithm is proposed. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and percussive sounds. The performance of the proposed method was compared with other separation algorithms which are based on the same signal model. These include for example independent subspace analysis and sparse coding. According to the simulations the proposed method enables perceptually better separation quality than the existing algorithms. Demonstration signals are available at
Shifted 2D Nonnegative Tensor Factorisation
Abstract

Cited by 2 (0 self)
... developed as a means of separating harmonic instruments from single channel mixtures. This technique uses a model which is convolutive in both time and frequency, and so can capture instruments which have both timevarying spectra and timevarying fundamental frequencies simultaneously. However, in many cases two or more channels are available, in which case it would be advantageous to have a multichannel version of the algorithm. To this end, a shifted 2D Nonnegative Tensor Factorisation algorithm is derived, which extends Nonnegative Matrix Factor 2D Deconvolution to the multichannel case. The use of this algorithm for multichannel sound source separation of pitched instruments is demonstrated.
doi:10.1155/2008/872425 Research Article Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation
"... Recently, shiftinvariant tensor factorisation algorithms have been proposed for the purposes of sound source separation of pitched musical instruments. However, in practice, existing algorithms require the use of logfrequency spectrograms to allow shift invariance in frequency which causes problem ..."
Abstract
Recently, shiftinvariant tensor factorisation algorithms have been proposed for the purposes of sound source separation of pitched musical instruments. However, in practice, existing algorithms require the use of logfrequency spectrograms to allow shift invariance in frequency which causes problems when attempting to resynthesise the separated sources. Further, it is difficult to impose harmonicity constraints on the recovered basis functions. This paper proposes a new additive synthesisbased approach which allows the use of linearfrequency spectrograms as well as imposing strict harmonic constraints, resulting in an improved model. Further, these additional constraints allow the addition of a source filter model to the factorisation framework, and an extended model which is capable of separating mixtures of pitched and percussive instruments simultaneously. Copyright © 2008 Derry FitzGerald et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1.