Results 1  10
of
13
Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
 IEEE Trans. On Audio, Speech and Lang. Processing
, 2007
"... Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain ..."
Abstract

Cited by 189 (30 self)
 Add to MetaCart
(Show Context)
Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements. Index Terms—Acoustic signal analysis, audio source separation, blind source separation, music, nonnegative matrix factorization, sparse coding, unsupervised learning. I.
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
(Show Context)
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, onechannel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of realworld sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses modelbased inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Towards Autonomous Agents for Live Computer Music: Realtime Machine Listening and Interactive Music Systems
, 2006
"... ..."
Discovering Auditory Objects Through NonNegativity Constraints,''
, 2004
"... Abstract We present a novel method for discovering auditory objects from scenes in a selforganized manner. Our approach is using nonnegativity constraints to find the building elements of a monaural auditory input. Surprisingly, although devoid of any statistical measures, this approach discovers ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
(Show Context)
Abstract We present a novel method for discovering auditory objects from scenes in a selforganized manner. Our approach is using nonnegativity constraints to find the building elements of a monaural auditory input. Surprisingly, although devoid of any statistical measures, this approach discovers independent elements in the scene similarly to previously reported methods employing ICA algorithms. The use of nonnegativity constraints makes this work best suited for spectral magnitude analysis and provides a fairly robust method for discovery and extraction of auditory objects from scenes.
Estimating the spatial position of spectral components in audio
 in Proc. Int. Conf. on Independent Component Analysis and Blind Source Separation (ICA
"... Abstract. One way of separating sources from a single mixture recording is by extracting spectral components and then combining them to form estimates of the sources. The grouping process remains a difficult problem. We propose, for instances when multiple mixture signals are available, clustering t ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Abstract. One way of separating sources from a single mixture recording is by extracting spectral components and then combining them to form estimates of the sources. The grouping process remains a difficult problem. We propose, for instances when multiple mixture signals are available, clustering the components based on their relative contribution to each mixture (i.e., their spatial position). We introduce novel factorizations of magnitude spectrograms from multiple recordings and derive update rules that extend independent subspace analysis and nonnegative matrix factorization to concurrently estimate the spectral shape, time envelope and spatial position of each component. We show that estimated component positions are near the position of their corresponding source, and that multichannel nonnegative matrix factorization can distinguish three pianos by their position in the mixture. 1
Monaural Sound Source Separation by Perceptually Weighted NonNegative Matrix Factorization
"... Abstract — A dataadaptive algorithm for the separation of sound sources from onechannel signals is presented. The algorithm applies weighted nonnegative matrix factorization on the power spectrogram of the input signal. Perceptually motivated weights for each critical band in each frame are used ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract — A dataadaptive algorithm for the separation of sound sources from onechannel signals is presented. The algorithm applies weighted nonnegative matrix factorization on the power spectrogram of the input signal. Perceptually motivated weights for each critical band in each frame are used to model the loudness perception of the human auditory system. The method compresses highenergy components, and enables the estimation of perceptually significant lowenergy characteristics of sources. The power spectrogram is factorized into a sum of components which have a fixed magnitude spectrum with a timevarying gain. Each source consists of one or more components. The parameters of the components are estimated by minimizing the weighted divergence between the observed power spectrogram and the model, for which a weighted nonnegative matrix factorization algorithm is proposed. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and percussive sounds. The performance of the proposed method was compared with other separation algorithms which are based on the same signal model. These include for example independent subspace analysis and sparse coding. According to the simulations the proposed method enables perceptually better separation quality than the existing algorithms. Demonstration signals are available at
Psysound3: software for acoustical and psychoacoustical analysis of sound recordings
 in Proceedings of the 13th International Conference on Auditory Display
, 2007
"... This paper describes a software project called PsySound3. This software provides an accessible platform for the analysis of sound recordings using procedures applied in acoustics and psychoacoustics. Acoustical analysis methods include a sound level meter module, as well as processes such as Fourier ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
This paper describes a software project called PsySound3. This software provides an accessible platform for the analysis of sound recordings using procedures applied in acoustics and psychoacoustics. Acoustical analysis methods include a sound level meter module, as well as processes such as Fourier transform, cepstrum, Hilbert transform and autocorrelation. Psychoacoustical models include dynamic loudness, sharpness, roughness, loudness fluctuation, pitch height and pitch strength. Results are presented as numbers, auditory graphs and visual graphs. The software is modular, allowing additional analysis methods to be contributed. Several additional analysis modules are planned. The software is distributed freely via www.psysound.org. This paper illustrates some of the analysis possibilities by using auditory alarms as examples. [Keywords: Sound analysis, Psychacoustics, Software]
Statistical Modeling and Synthesis of Intrinsic Structures in Impact Sounds
, 2007
"... The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Acknowledgements I would like to thank my supervisor, ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Acknowledgements I would like to thank my supervisor, Dr. Michael Lewicki, for all the help, guidance and encouragement he gave me throughout the years I spent at Carnegie Mellon University. I also thank him for always giving me the freedom to explore the paths I believed in and found most interesting. I would also like to thank everyone else in the Laboratory for Computational Perception and Statistical Learning: Dr. Evan Smith, Yan Karklin, Xuejing Chen, Daniel Leeds, and especially Dr. Eizaburo Doi and Doru Balcan for all the very useful discussions and interactions. Many thanks also to Doru, Yan and Daniel for proofreading parts of this dissertation. It was a great pleasure to work and interact with Dr. Roger Dannenberg, from whom I have learned a great deal on audio and acoustics. I thank him for having me as a TA for both of his computer music courses. I also thank him for all the help and very useful feedback he gave me. I would like to thank Drs. Richard Stern, Daniel Ellis, Tom Cortina, and Mark Kahr for their interest in my work and all the valuable advice, comments, and feedback they gave me. I also
Discovering Auditory Objects Through NonNegativity Constraints
, 2004
"... We present a novel method for discovering auditory objects from scenes in a selforganized manner. Our approach is using nonnegativity constraints to find the building elements of a monaural auditory input. Surprisingly, although devoid of any statistical measures, this approach discovers independe ..."
Abstract
 Add to MetaCart
(Show Context)
We present a novel method for discovering auditory objects from scenes in a selforganized manner. Our approach is using nonnegativity constraints to find the building elements of a monaural auditory input. Surprisingly, although devoid of any statistical measures, this approach discovers independent elements in the scene similarly to previously reported methods employing ICA algorithms. The use of nonnegativity constraints makes this work best suited for spectral magnitude analysis and provides a fairly robust method for discovery and extraction of auditory objects from scenes. 1.