Results 1  10
of
54
Singlechannel speech separation using sparse nonnegative matrix factorization
 in International Conference on Spoken Language Processing (INTERSPEECH
, 2006
"... We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording. The method of choice is a sparse nonnegative matrix factorization algorithm, which in an unsupervised manner can learn sparse representations of the data. This is applied to ..."
Abstract

Cited by 68 (4 self)
 Add to MetaCart
(Show Context)
We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording. The method of choice is a sparse nonnegative matrix factorization algorithm, which in an unsupervised manner can learn sparse representations of the data. This is applied to the learning of personalized dictionaries from a speech corpus, which in turn are used to separate the audio stream into its components. We show that computational savings can be achieved by segmenting the training data on a phoneme level. To split the data, a conventional speech recognizer is used. The performance of the unsupervised and supervised adaptation schemes result in significant improvements in terms of the targettomasker ratio. Index Terms: Singlechannel source separation, sparse nonnegative matrix factorization.
Transcription and separation of drum signals from polyphonic music
 IEEE Trans. on Audio, Speech and Language Processing
, 2008
"... Abstract—The purpose of this article is to present new advances in music transcription and source separation with a focus on drum signals. A complete drum transcription system is described, which combines information from the original music signal and a drum track enhanced version obtained by source ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
(Show Context)
Abstract—The purpose of this article is to present new advances in music transcription and source separation with a focus on drum signals. A complete drum transcription system is described, which combines information from the original music signal and a drum track enhanced version obtained by source separation. In addition to efficient fusion strategies to take into account these two complementary sources of information, the transcription system integrates a large set of features, optimally selected by feature selection. Concurrently, the problem of drum track extraction from polyphonic music is tackled both by proposing a novel approach based on harmonic/noise decomposition and time/frequency masking and by improving an existing Wiener filteringbased separation method. The separation and transcription techniques presented are thoroughly evaluated on a large public database of music signals. A transcription accuracy between 64.5 % and 80.3% is obtained, depending on the drum instrument, for wellbalanced mixes, and the efficiency of our drum separation algorithms is illustrated in a comprehensive benchmark. Index Terms—Drum signals, feature selection, harmonic/noise decomposition, music transcription, source separation, support vector machine (SVM), Wiener filtering. I.
Probabilistic latent variable models as nonnegative factorizations
 Computational Intelligence and Neuroscience, 2008. Article ID 947438
"... This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invar ..."
Abstract

Cited by 38 (6 self)
 Add to MetaCart
(Show Context)
This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invariances, higherorder decompositions and sparsity constraints. We argue through these extensions that the use of this approach allows for rapid development of complex statistical models for analyzing nonnegative data.
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
(Show Context)
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, onechannel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of realworld sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses modelbased inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Shiftinvariant probabilistic latent component analysis
 Journal of Machine Learning Research (under review
, 2008
"... In this paper we present a model which can decompose a probability densities or count data into a set of shift invariant components. We begin by introducing a regular latent variable model and subsequently extend it to deal with shift invariance in order to model more complex inputs. We develop an e ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
In this paper we present a model which can decompose a probability densities or count data into a set of shift invariant components. We begin by introducing a regular latent variable model and subsequently extend it to deal with shift invariance in order to model more complex inputs. We develop an expectation maximization algorithm for estimating components and present various results on challenging realworld data. We show that this approach is a probabilistic generalization of well known algorithms such as NonNegative Matrix Factorization and multiway decompositions, and discuss its advantages over such approaches.
Multipitch Analysis with Harmonic Nonnegative Matrix Approximation
 in ISMIR 2007, 8th International Conference on Music Information Retrieval
, 2007
"... This paper presents a new approach to multipitch analysis by utilizing the Harmonic Nonnegative Matrix Approximation, a harmonicallyconstrained and penalized version of the Nonnegative Matrix Approximation (NNMA) method. It also includes a description of a note onset, offset and amplitude retrieval ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
(Show Context)
This paper presents a new approach to multipitch analysis by utilizing the Harmonic Nonnegative Matrix Approximation, a harmonicallyconstrained and penalized version of the Nonnegative Matrix Approximation (NNMA) method. It also includes a description of a note onset, offset and amplitude retrieval procedure based on that technique. Compared with the previous NNMA approaches, specific initialization of the basis matrix is employed – the basis matrix is initialized with zeros everywhere but at positions corresponding to harmonic frequencies of consequent notes of the equal temperament scale. This results in the basis containing nothing but harmonically structured vectors, even after the learning process, and the activity matrix’s rows containing peaks corresponding to note onset times and amplitudes. Furthermore, additional penalties of mutual uncorrelation and sparseness of rows are placed upon the activity matrix. The proposed method is able to uncover the underlying musical structure better than the previous NNMA approaches and makes the note detection process very straightforward. 1
Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction
 in Proc
, 2008
"... As many acoustic signal processing methods, for example for source separation or noise canceling, operate in the magnitude spectrogram domain, the problem of reconstructing a perceptually good sounding signal from a modified magnitude spectrogram, and more generally to understand what makes a spectr ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
As many acoustic signal processing methods, for example for source separation or noise canceling, operate in the magnitude spectrogram domain, the problem of reconstructing a perceptually good sounding signal from a modified magnitude spectrogram, and more generally to understand what makes a spectrogram consistent, is very important. In this article, we derive the constraints which a set of complex numbers must verify to be a consistent STFT spectrogram, i.e. to be the STFT spectrogram of a real signal, and describe how they lead to an objective function measuring the consistency of a set of complex numbers as a spectrogram. We then present a flexible phase reconstruction algorithm based on a local approximation of the consistency constraints, explain its relation with phasecoherence conditions devised as necessary for a good perceptual sound quality, and derive a realtime time scale modification algorithm based on slidingblock analysis. Finally, we show how inconsistency can be used to develop a spectrogrambased audio encryption scheme.
Computational auditory induction as a missingdata modelfitting problem with Bregman divergence,” Speech Communication, vol
, 2010
"... The human auditory system has the ability, known as auditory induction, to estimate the missing parts of a continuous auditory stream briefly covered by noise and perceptually resynthesize them. Humans are thus able to simultaneously analyze an auditory scene and reconstruct the underlying signal. I ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
The human auditory system has the ability, known as auditory induction, to estimate the missing parts of a continuous auditory stream briefly covered by noise and perceptually resynthesize them. Humans are thus able to simultaneously analyze an auditory scene and reconstruct the underlying signal. In this article, we formulate this ability as a nonnegative matrix factorization (NMF) problem with unobserved data, and show how to solve it using an auxiliary function method. We explain how this method can also be generally related to the EM algorithm, enabling the use of prior distributions on the parameters. We show how sparseness is a key to global feature extraction, and that our method is ideally able to extract patterns which never occur completely. We finally illustrate on an example how our method is able to simultaneously analyze a scene and interpolate the gaps into it.
Nonnegative Matrix Factorization with Quadratic Programming
, 2006
"... Nonnegative Matrix Factorization (NMF) solves the following problem: find such nonnegative matrices A ∈ R I×J + and X ∈ R J×K + that Y ∼ = AX, given only Y ∈ R I×K and the assigned index J (K>> I ≥ J). Basically, the factorization is achieved by alternating minimization of a given cost functi ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Nonnegative Matrix Factorization (NMF) solves the following problem: find such nonnegative matrices A ∈ R I×J + and X ∈ R J×K + that Y ∼ = AX, given only Y ∈ R I×K and the assigned index J (K>> I ≥ J). Basically, the factorization is achieved by alternating minimization of a given cost function subject to nonnegativity constraints. In the paper, we propose to use Quadratic Programming (QP) to solve the minimization problems. The Tikhonov regularized squared Euclidean cost function is extended with a logarithmic barrier function (which satisfies nonnegativity constraints), and then using secondorder Taylor expansion, a QP problem is formulated. This problem is solved with some trustregion subproblem algorithm. The numerical tests are performed on the blind source separation problems.
Fast Nonnegative Matrix Factorization Algorithms Using Projected Gradient Approaches for LargeScale Problems
, 2008
"... Recently, a considerable growth of interest in projected gradient (PG) methods has been observed due to their high efficiency in solving largescale convex minimization problems subject to linear constraints. Since the minimization problems underlying nonnegative matrix factorization (NMF) of large ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Recently, a considerable growth of interest in projected gradient (PG) methods has been observed due to their high efficiency in solving largescale convex minimization problems subject to linear constraints. Since the minimization problems underlying nonnegative matrix factorization (NMF) of large matrices well matches this class of minimization problems, we investigate and test some recent PG methods in the context of their applicability to NMF. In particular, the paper focuses on the following modified methods: projected Landweber, BarzilaiBorwein gradient projection, projected sequential subspace optimization (PSESOP), interiorpoint Newton (IPN), and sequential coordinatewise. The proposed and implemented NMF PG algorithms are compared with respect to their performance in terms of signaltointerference ratio (SIR) and elapsed time, using a simple benchmark of mixed partially dependent nonnegative signals.