Results 1 - 10
of
30
Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
- IEEE Trans. On Audio, Speech and Lang. Processing
, 2007
"... Abstract—An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain ..."
Abstract
-
Cited by 189 (30 self)
- Add to MetaCart
Abstract—An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements. Index Terms—Acoustic signal analysis, audio source separation, blind source separation, music, nonnegative matrix factorization, sparse coding, unsupervised learning. I.
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
(Show Context)
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Non-negative tensor factorisation for sound source separation
- IN: PROCEEDINGS OF IRISH SIGNALS AND SYSTEMS CONFERENCE
, 2005
"... ... is introduced which extends current matrix factorisation techniques to deal with tensors. The effectiveness of the algorithm is then demonstrated through tests on synthetic data. The algorithm is then employed as a means of performing sound source separation on two channel mixtures, and the sepa ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
... is introduced which extends current matrix factorisation techniques to deal with tensors. The effectiveness of the algorithm is then demonstrated through tests on synthetic data. The algorithm is then employed as a means of performing sound source separation on two channel mixtures, and the separation capabilities of the algorithm demonstrated on a two channel mixture containing saxophone, strings and bass guitar.
Sound Source Separation using Shifted Non-negative Tensor Factorisation
- Proceedings on the IEE Conference on Audio and Speech Signal Processing (ICASSP
, 2006
"... Recently, shifted Non-negative Matrix Factorisation was developed as a means of separating harmonic instruments from single channel mixtures. However, in many cases two or more channels are available, in which case it would be advantageous to have a multichannel version of the algorithm. To this end ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
(Show Context)
Recently, shifted Non-negative Matrix Factorisation was developed as a means of separating harmonic instruments from single channel mixtures. However, in many cases two or more channels are available, in which case it would be advantageous to have a multichannel version of the algorithm. To this end, a shifted Non-negative Tensor Factorisation algorithm is derived, which extends shifted Non-negative Matrix Factorisation to the multi-channel case. The use of this algorithm for multi-channel sound source separation of harmonic instruments is demonstrated. Further, it is shown that the algorithm can be used to perform Non-negative Tensor Deconvolution, a multi-channel version of Non-negative Matrix Deconvolution, to separate sound sources which have time evolving spectra from multi-channel signals. 1.
Monaural Sound Source Separation by Perceptually Weighted Non-Negative Matrix Factorization
"... Abstract — A data-adaptive algorithm for the separation of sound sources from one-channel signals is presented. The algorithm applies weighted non-negative matrix factorization on the power spectrogram of the input signal. Perceptually motivated weights for each critical band in each frame are used ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
Abstract — A data-adaptive algorithm for the separation of sound sources from one-channel signals is presented. The algorithm applies weighted non-negative matrix factorization on the power spectrogram of the input signal. Perceptually motivated weights for each critical band in each frame are used to model the loudness perception of the human auditory system. The method compresses high-energy components, and enables the estimation of perceptually significant low-energy characteristics of sources. The power spectrogram is factorized into a sum of components which have a fixed magnitude spectrum with a time-varying gain. Each source consists of one or more components. The parameters of the components are estimated by minimizing the weighted divergence between the observed power spectrogram and the model, for which a weighted non-negative matrix factorization algorithm is proposed. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and percussive sounds. The performance of the proposed method was compared with other separation algorithms which are based on the same signal model. These include for example independent subspace analysis and sparse coding. According to the simulations the proposed method enables perceptually better separation quality than the existing algorithms. Demonstration signals are available at
Generalised prior subspace analysis for polyphonic pitch transcription
- in Proc. Int. Conf. on Digital Audio Effects (DAFx
, 2005
"... A reformulation of Prior Subspace Analysis (PSA) is presented, which restates the problem as that of fitting an undercomplete signal dictionary to a spectrogram. Further, a generalization of PSA is derived which allows the transcription of polyphonic pitched instruments. This involves the translatio ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
(Show Context)
A reformulation of Prior Subspace Analysis (PSA) is presented, which restates the problem as that of fitting an undercomplete signal dictionary to a spectrogram. Further, a generalization of PSA is derived which allows the transcription of polyphonic pitched instruments. This involves the translation of a single frequency prior subspace of a note to approximate other notes, overcoming the problem of needing a separate basis function for each note played by an instrument. Examples are then demon-strated which show the utility of the generalised PSA algorithm for the purposes of polyphonic pitch transcription. 1.
Separation of Musical Sources and Structure from SingleChannel Polyphonic Recordings University of
, 2006
"... The thesis deals principally with the separation of pitched sources from single-channel polyphonic musical recordings. The aim is to extract from a mixture a set of pitched instruments or sources, where each source contains a set of similarly sounding events or notes, and each note is seen as compri ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
The thesis deals principally with the separation of pitched sources from single-channel polyphonic musical recordings. The aim is to extract from a mixture a set of pitched instruments or sources, where each source contains a set of similarly sounding events or notes, and each note is seen as comprising partial, transient and noise content. The work also has implications for separating non-pitched or percussive sounds from recordings, and in general, for unsupervised clustering of a list of detected audio events in a recording into a meaningful set of source classes. The alignment of a symbolic score/MIDI representation with the recording constitutes a pre-processing stage. The three main areas of con-tribution are: firstly, the design of harmonic tracking algorithms and spectral-filtering techniques for removing harmonics from the mixture, where particular attention has been paid to the case of harmonics which are overlapping in fre-quency. Secondly, some studies will be presented for separating transient attacks from recordings, both when they are distinguishable from and when they are overlapping in time with other transients. This section also includes a method which proposes that the behaviours of the harmonic and noise components of a note are partially correlated. This is used to share the noise component of a mixture of pitched notes between the interfering sources. Thirdly, unsupervised clustering has been applied to the task of grouping a set of separated notes from the recording into sources, where notes belonging to the same source ide-ally have similar features or attributes. Issues relating to feature computation, feature selection, dimensionality and dependence on a symbolic music repre-sentation are explored. Applications of this work exist in audio spatialisation, audio restoration, music content description, effects processing and elsewhere.
Shifted 2D Non-negative Tensor Factorisation
"... ... developed as a means of separating harmonic instruments from single channel mixtures. This technique uses a model which is convolutive in both time and frequency, and so can capture instruments which have both time-varying spectra and timevarying fundamental frequencies simultaneously. However, ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
... developed as a means of separating harmonic instruments from single channel mixtures. This technique uses a model which is convolutive in both time and frequency, and so can capture instruments which have both time-varying spectra and timevarying fundamental frequencies simultaneously. However, in many cases two or more channels are available, in which case it would be advantageous to have a multi-channel version of the algorithm. To this end, a shifted 2D Non-negative Tensor Factorisation algorithm is derived, which extends Non-negative Matrix Factor 2D Deconvolution to the multi-channel case. The use of this algorithm for multi-channel sound source separation of pitched instruments is demonstrated.
ACOUSTIC MODELLING OF DRUM SOUNDS WITH HIDDEN MARKOV MODELS FOR MUSIC TRANSCRIPTION
"... This paper describes two methods for applying hidden Markov models (HMMs) to acoustic modelling of drum sound events for polyphonic music transcription. The proposed methods are instrumentwise binary modelling and modelling of instrument combinations. In the first, each target instrument is modelled ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
This paper describes two methods for applying hidden Markov models (HMMs) to acoustic modelling of drum sound events for polyphonic music transcription. The proposed methods are instrumentwise binary modelling and modelling of instrument combinations. In the first, each target instrument is modelled with a “sound ” model and all target instruments share a “silence ” model. Each instrument is transcribed independently from the others. In the latter method, different instrument combinations are modelled, and an additional “silence ” model is created. The proposed methods are evaluated with simulations with acoustic data, and compared with two reference methods. Simulations show that combination modelling performs better than instrument-wise modelling. 1.