Results 1 - 10
of
23
Bayesian extensions to non-negative matrix factorisation for audio signal modelling
- in ICASSP, 2008
"... We describe the underlying probabilistic generative signal model of non-negative matrix factorisation (NMF) and propose a realistic conjugate priors on the matrices to be estimated. A conjugate Gamma chain prior enables modelling the spectral smoothness of natural sounds in general, and other prior ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
(Show Context)
We describe the underlying probabilistic generative signal model of non-negative matrix factorisation (NMF) and propose a realistic conjugate priors on the matrices to be estimated. A conjugate Gamma chain prior enables modelling the spectral smoothness of natural sounds in general, and other prior knowledge about the spectra of the sounds can be used without resorting to too restrictive techniques where some of the parameters are fixed. The resulting algorithm, while retaining the attractive features of standard NMF such as fast convergence and easy implementation, outperforms existing NMF strategies in a single channel audio source separation and detection task. Index Terms — acoustic signal processing, matrix decomposition, MAP estimation, source separation 1.
Probabilistic latent variable models as nonnegative factorizations
- Computational Intelligence and Neuroscience, 2008. Article ID 947438
"... This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invar ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
(Show Context)
This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invariances, higher-order decompositions and sparsity constraints. We argue through these extensions that the use of this approach allows for rapid development of complex statistical models for analyzing nonnegative data.
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
(Show Context)
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds
"... In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizabl ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
(Show Context)
In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1
Sparse overcomplete decomposition for single channel speaker separation
- In ICASSP
, 2007
"... We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis [6]. The idea is to extract certain characteristic spectro-temporal basis functions from training data for individual speakers and decompose ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
(Show Context)
We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis [6]. The idea is to extract certain characteristic spectro-temporal basis functions from training data for individual speakers and decompose the mixed signals as linear combinations of these learned bases. In other words, their model extracts a compact code of basis functions that can explain the space spanned by spectral vectors of a speaker. In our model, we generate a sparse-distributed code where we have more basis functions than the dimensionality of the space. We propose a probabilistic framework to achieve sparsity. Experiments show that the resulting sparse code better captures the structure in data and hence leads to better separation.
Prior structures for time-frequency energy distributions
- in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA ’07
, 2007
"... We introduce a framework for probabilistic modelling of timefrequency energy distributions based on correlated Gamma and inverse Gamma random variables. One advantage of the approach is that the resulting class of models are conjugate which makes inference easier. Moreover, both positivity and addit ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
We introduce a framework for probabilistic modelling of timefrequency energy distributions based on correlated Gamma and inverse Gamma random variables. One advantage of the approach is that the resulting class of models are conjugate which makes inference easier. Moreover, both positivity and additivity follow naturally in this framework. We illustrate how generic models (applicable to a broad class of signals) and more specialised models can be designed to model harmonicity, spectral continuity and/or changepoints. We show simulation results that illustrate the potential of the approach on a large spectrum of audio processing applications such as denoising, source separation and transcription. 1.
SEPARATING A FOREGROUND SINGER FROM BACKGROUND MUSIC
"... In this paper we present a algorithm for separating singing voices from background music in popular songs. The algorithm is derived by modelling the magnitude spectrogram of audio signals as the outcome of draws from a discrete bi-variate random process that generates time-frequency pairs. The spect ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
In this paper we present a algorithm for separating singing voices from background music in popular songs. The algorithm is derived by modelling the magnitude spectrogram of audio signals as the outcome of draws from a discrete bi-variate random process that generates time-frequency pairs. The spectrogram of a song is assumed to have been obtained through draws from the distributions underlying the music and the vocals, respectively. The parameters of the underlying distribuiton are learnt from the observed spectrogram of the song. The spectrogram of the separated vocals is then derived by estimating the fraction of draws that were obtained from its distribution. In the paper we present the algorithm within a framework that allows personalization of popular songs, by separating out the vocals, processing them appropriately to one’s own tastes, and remixing them. Our experiments reveal that we are effectively able to separate out the vocals in a song and personalize them to our tastes.
Latent Dirichlet decomposition for single channel speaker separation
- In ICASSP 2006
"... We present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
We present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The distribution of the process is modelled as a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learnt from training signals for each speaker. We model the prior distribution of the mixture weights for each speaker as a dirichlet distribution. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution i.e the spectrum for each speaker is reconstructed from this decomposition. 1.
HMM-REGULARIZATION FOR NMF-BASED NOISE ROBUST ASR
"... In this work we extend a previously proposed NMF-based technique for speech enhancement of noisy speech to exploit a Hidden Markov Model (HMM). The NMF-based technique works by finding a sparse representation of specrogram segments of noisy speech in a dictionary containing both speech and noise exe ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
In this work we extend a previously proposed NMF-based technique for speech enhancement of noisy speech to exploit a Hidden Markov Model (HMM). The NMF-based technique works by finding a sparse representation of specrogram segments of noisy speech in a dictionary containing both speech and noise exemplars, and uses the activated dictionary atoms to create a time-varying filter to enhance the noisy speech. In order to take into account larger temporal context and constrain the representation by the grammar of a speech recognizer, we propose to regularize the optimization problem by additionally minimizing the distance between state emission probabilities derived from the speech exemplar activations, and a posteriori state probabilities derived by applying the Forward-Backward algorithm to the emission probabilities. Experiments on Track 1 of the 2nd CHiME Challenge, which contains small vocabulary speech corrupted by both reverberation and authentic living room noise at varying SNRs ranging from 9 to-6 dB, confirm the validity of the proposed technique. Index Terms: speech enhancement, exemplar-based, noise robustness, Non-Negative Matrix Factorization, Hidden Markov Models
fiddle is important too: pitch tracking individual voices in polyphonic music
- in Proc. ISMIR
, 2012
"... ABSTRACT Recently, there has been much interest in automatic pitch estimation and note tracking of polyphonic music. To date, however, most techniques produce a representation where pitch estimates are not associated with any particular instrument or voice. Therefore, the actual tracks for each ins ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT Recently, there has been much interest in automatic pitch estimation and note tracking of polyphonic music. To date, however, most techniques produce a representation where pitch estimates are not associated with any particular instrument or voice. Therefore, the actual tracks for each instrument are not readily accessible. Access to individual tracks is needed for more complete music transcription and additionally will provide a window to the analysis of higher constructs such as counterpoint and instrument theme imitation during a composition. In this paper, we present a method for tracking the pitches (F0s) of individual instruments in polyphonic music. The system uses a pre-learned dictionary of spectral basis vectors for each note for a variety of musical instruments. The method then formulates the tracking of pitches of individual voices in a probabilistic manner by attempting to explain the input spectrum as the most likely combination of musical instruments and notes drawn from the dictionary. The method has been evaluated on a subset of the MIREX multiple-F0 estimation test dataset, showing promising results.