• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Latent variable decomposition of spectrograms for single channel speaker separation (2005)

by B Raj, P Smaragdis
Venue:in IEEE WASPAA
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 23
Next 10 →

Bayesian extensions to non-negative matrix factorisation for audio signal modelling

by Tuomas Virtanen, A. Taylan Cemgil, Simon Godsill - in ICASSP, 2008
"... We describe the underlying probabilistic generative signal model of non-negative matrix factorisation (NMF) and propose a realistic conjugate priors on the matrices to be estimated. A conjugate Gamma chain prior enables modelling the spectral smoothness of natural sounds in general, and other prior ..."
Abstract - Cited by 42 (6 self) - Add to MetaCart
We describe the underlying probabilistic generative signal model of non-negative matrix factorisation (NMF) and propose a realistic conjugate priors on the matrices to be estimated. A conjugate Gamma chain prior enables modelling the spectral smoothness of natural sounds in general, and other prior knowledge about the spectra of the sounds can be used without resorting to too restrictive techniques where some of the parameters are fixed. The resulting algorithm, while retaining the attractive features of standard NMF such as fast convergence and easy implementation, outperforms existing NMF strategies in a single channel audio source separation and detection task. Index Terms — acoustic signal processing, matrix decomposition, MAP estimation, source separation 1.
(Show Context)

Citation Context

... robust inference. Alternatively, one can learn a set of basis vectors from a training corpus where each source is present in isolation, and then keep the basis vectors fixed and estimate their gains =-=[6]-=-. This latter approach produces good results when the spectral characteristics of the training data are equal to those of the target data. In practice, however, the exact characteristics of the target...

Probabilistic latent variable models as nonnegative factorizations

by Madhusudana Shashanka , Bhiksha Raj , Paris Smaragdis - Computational Intelligence and Neuroscience, 2008. Article ID 947438
"... This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invar ..."
Abstract - Cited by 38 (6 self) - Add to MetaCart
This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invariances, higher-order decompositions and sparsity constraints. We argue through these extensions that the use of this approach allows for rapid development of complex statistical models for analyzing nonnegative data.
(Show Context)

Citation Context

...See [17], [18] for detailed derivation of the update equations. The equivalence between NMF and PLSA has also been pointed out by [19]. The model has been used for the analysis of audio spectra (eg., =-=[20]-=-), images (eg., [17], [21]) and text corpora (eg., [7]). IV. MODEL EXTENSIONS The popularity of NMF comes mainly from its empirical success in finding “useful components” from the data. As pointed out...

Sound Source Separation in Monaural Music Signals

by Tuomas Virtanen , 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract - Cited by 36 (4 self) - Add to MetaCart
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
(Show Context)

Citation Context

...resent simultaneously at all times, the algorithm is likely to represent then with a single component. One possibility to motivative NMF is the probabilistic interpretation given by Raj and Smaragdis =-=[143]-=-, who considered the gains and basis functions as probability distributions conditional to each component. This allows deriving the multiplicative updates (2.23)-(2.24) from the expectation maximizati...

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

by Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj
"... In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizabl ..."
Abstract - Cited by 21 (8 self) - Add to MetaCart
In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1
(Show Context)

Citation Context

...acterizations may include codebooks [1], Gaussian mixture densities [2], HMMs [3], independent components [4, 5], sparse dictionaries [6], non-negative decompositions [7–9] and latent variable models =-=[10,11]-=-. All of these methods attempt to derive a generalizable model that captures the salient characteristics of each source. Separation is achieved by abstracting components from the mixed signal that con...

Sparse overcomplete decomposition for single channel speaker separation

by Madhusudana V. S. Shashanka, Bhiksha Raj - In ICASSP , 2007
"... We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis [6]. The idea is to extract certain characteristic spectro-temporal basis functions from training data for individual speakers and decompose ..."
Abstract - Cited by 12 (6 self) - Add to MetaCart
We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis [6]. The idea is to extract certain characteristic spectro-temporal basis functions from training data for individual speakers and decompose the mixed signals as linear combinations of these learned bases. In other words, their model extracts a compact code of basis functions that can explain the space spanned by spectral vectors of a speaker. In our model, we generate a sparse-distributed code where we have more basis functions than the dimensionality of the space. We propose a probabilistic framework to achieve sparsity. Experiments show that the resulting sparse code better captures the structure in data and hence leads to better separation.
(Show Context)

Citation Context

...677 Beacon St, Boston MA 02215 ABSTRACT We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis =-=[6]-=-. The idea is to extract certain characteristic spectro-temporal basis functions from training data for individual speakers and decompose the mixed signals as linear combinations of these learned base...

Prior structures for time-frequency energy distributions

by Ali Taylan Cemgil, Paul Peeling, Onur Dikmen, Simon Godsill - in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA ’07 , 2007
"... We introduce a framework for probabilistic modelling of timefrequency energy distributions based on correlated Gamma and inverse Gamma random variables. One advantage of the approach is that the resulting class of models are conjugate which makes inference easier. Moreover, both positivity and addit ..."
Abstract - Cited by 9 (3 self) - Add to MetaCart
We introduce a framework for probabilistic modelling of timefrequency energy distributions based on correlated Gamma and inverse Gamma random variables. One advantage of the approach is that the resulting class of models are conjugate which makes inference easier. Moreover, both positivity and additivity follow naturally in this framework. We illustrate how generic models (applicable to a broad class of signals) and more specialised models can be designed to model harmonicity, spectral continuity and/or changepoints. We show simulation results that illustrate the potential of the approach on a large spectrum of audio processing applications such as denoising, source separation and transcription. 1.
(Show Context)

Citation Context

...ent of a signal is typically time-varying hence it is natural to model audio with a process with a time varying power spectral density on a time frequency plane using switches [9, 2, 10], a histogram =-=[11]-=- or source filter models in cepstral domain [12]. In this paper, we follow a transform domain modelling approach and focus on the following hierarchical source model p(s|v)p(v) = Y ! p(sν,τ|vν,τ ) p(v...

SEPARATING A FOREGROUND SINGER FROM BACKGROUND MUSIC

by Bhiksha Raj
"... In this paper we present a algorithm for separating singing voices from background music in popular songs. The algorithm is derived by modelling the magnitude spectrogram of audio signals as the outcome of draws from a discrete bi-variate random process that generates time-frequency pairs. The spect ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
In this paper we present a algorithm for separating singing voices from background music in popular songs. The algorithm is derived by modelling the magnitude spectrogram of audio signals as the outcome of draws from a discrete bi-variate random process that generates time-frequency pairs. The spectrogram of a song is assumed to have been obtained through draws from the distributions underlying the music and the vocals, respectively. The parameters of the underlying distribuiton are learnt from the observed spectrogram of the song. The spectrogram of the separated vocals is then derived by estimating the fraction of draws that were obtained from its distribution. In the paper we present the algorithm within a framework that allows personalization of popular songs, by separating out the vocals, processing them appropriately to one’s own tastes, and remixing them. Our experiments reveal that we are effectively able to separate out the vocals in a song and personalize them to our tastes.
(Show Context)

Citation Context

...r. 4. SEPARATING COMPONENT SIGNALS FROM A MIXTURE The statistical model presented in Section 3 can be used to separate out component signals from a signal, such as the speakers from a mixed recording =-=[7]-=-. The set of basis vectors described by the frequency marginals P (f|z) are learned for each component signal in the mixture from a separate unmixed training recording. Let Pi(t, f) represent the dist...

Latent Dirichlet decomposition for single channel speaker separation

by Bhiksha Raj, Paris Smaragdis, Madhusudana V. S. Shashanka - In ICASSP 2006
"... We present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
We present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The distribution of the process is modelled as a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learnt from training signals for each speaker. We model the prior distribution of the mixture weights for each speaker as a dirichlet distribution. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution i.e the spectrum for each speaker is reconstructed from this decomposition. 1.
(Show Context)

Citation Context

...dentifies typical spectral structures for speakers through latent-variable decomposition of their magnitude spectra. The latent-variable model for speaker separation, originally proposed by Raj et al =-=[6]-=-, assumes that spectral vectors of speech are the outcomes of a discrete random process that generates frequency bin indices. Each analysis window (frame) of 1 The term speaker here refers to a person...

HMM-REGULARIZATION FOR NMF-BASED NOISE ROBUST ASR

by Jort F. Gemmeke, Antti Hurmalainen, Tuomas Virtanen
"... In this work we extend a previously proposed NMF-based technique for speech enhancement of noisy speech to exploit a Hidden Markov Model (HMM). The NMF-based technique works by finding a sparse representation of specrogram segments of noisy speech in a dictionary containing both speech and noise exe ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
In this work we extend a previously proposed NMF-based technique for speech enhancement of noisy speech to exploit a Hidden Markov Model (HMM). The NMF-based technique works by finding a sparse representation of specrogram segments of noisy speech in a dictionary containing both speech and noise exemplars, and uses the activated dictionary atoms to create a time-varying filter to enhance the noisy speech. In order to take into account larger temporal context and constrain the representation by the grammar of a speech recognizer, we propose to regularize the optimization problem by additionally minimizing the distance between state emission probabilities derived from the speech exemplar activations, and a posteriori state probabilities derived by applying the Forward-Backward algorithm to the emission probabilities. Experiments on Track 1 of the 2nd CHiME Challenge, which contains small vocabulary speech corrupted by both reverberation and authentic living room noise at varying SNRs ranging from 9 to-6 dB, confirm the validity of the proposed technique. Index Terms: speech enhancement, exemplar-based, noise robustness, Non-Negative Matrix Factorization, Hidden Markov Models
(Show Context)

Citation Context

...mplex sounds as being composed of a purely additive combinations of spectral atoms, has proven to be adept at separating the target speech from interfering sounds such as noise [7, 8], other speakers =-=[9, 10]-=-, music [11–13] and even reverberation [14]. For noise-robust automatic The research of Jort F. Gemmeke was funded by IWT-SBO project ALADIN contract 100049. Tuomas Virtanen has been funded by the Aca...

fiddle is important too: pitch tracking individual voices in polyphonic music

by Mert Bay , Andreas F Ehmann , James W Beauchamp , Paris Smaragdis , J Stephen Downie - in Proc. ISMIR , 2012
"... ABSTRACT Recently, there has been much interest in automatic pitch estimation and note tracking of polyphonic music. To date, however, most techniques produce a representation where pitch estimates are not associated with any particular instrument or voice. Therefore, the actual tracks for each ins ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
ABSTRACT Recently, there has been much interest in automatic pitch estimation and note tracking of polyphonic music. To date, however, most techniques produce a representation where pitch estimates are not associated with any particular instrument or voice. Therefore, the actual tracks for each instrument are not readily accessible. Access to individual tracks is needed for more complete music transcription and additionally will provide a window to the analysis of higher constructs such as counterpoint and instrument theme imitation during a composition. In this paper, we present a method for tracking the pitches (F0s) of individual instruments in polyphonic music. The system uses a pre-learned dictionary of spectral basis vectors for each note for a variety of musical instruments. The method then formulates the tracking of pitches of individual voices in a probabilistic manner by attempting to explain the input spectrum as the most likely combination of musical instruments and notes drawn from the dictionary. The method has been evaluated on a subset of the MIREX multiple-F0 estimation test dataset, showing promising results.
(Show Context)

Citation Context

... regular PLCA to build dictionaries of spectra indexed by F0 and instrument where the spectra are analyzed from recordings of individual notes of different musical instruments. We extend the model of =-=[16]-=- to represent each source/instrument not by only one spectral dictionary, but rather with a collection of dictionaries, each of which is an ensemble of spectral basis vectors that have the same F0. We...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University