• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Sound source separation in monaural music signals," (2006)

by T Virtanen
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 36
Next 10 →

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

by Tuomas Virtanen, Anssi Klapuri - in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop , 2006
"... •Framework for (polyphonic) audio — linear signal model for magnitude spectrum xt(k): x̂t(k) = N∑ ..."
Abstract - Cited by 20 (3 self) - Add to MetaCart
•Framework for (polyphonic) audio — linear signal model for magnitude spectrum xt(k): x̂t(k) = N∑

Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation

by Yipeng Li, John Woodruff, Student Member, Deliang Wang
"... Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modeling-based separation system that can effecti ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modeling-based separation system that can effectively resolve overlapping harmonics. Our strategy is based on the observations that harmonics of the same source have correlated amplitude envelopes and that the change in phase of a harmonic is related to the instrument’s pitch. We use these two observations in a least squares estimation framework for separation of overlapping harmonics. The system directly distributes mixture energy for harmonics that are unobstructed by other sources. Quantitative evaluation of the proposed system is shown when ground truth pitch information is available, when rough pitch estimates are provided in the form of a MIDI score, and finally, when a multipitch tracking algorithm is used. We also introduce a technique to improve the accuracy of rough pitch estimates. Results show that the proposed system significantly outperforms related monaural musical sound separation systems. Index Terms—Common amplitude modulation (CAM), musical sound separation, sinusoidal modeling, time–frequency masking, underdetermined sound separation. I.
(Show Context)

Citation Context

... amount of online music data has exploded. A solution to this problem allows for efficient audio coding, accurate content-based analysis, and sophisticated manipulation of musical signals [18], [25], =-=[31]-=-, [39]. In this paper, we address the problem of monaural musical sound separation, where multiple harmonic instruments are recorded by a single microphone or mixed to a single channel. Broadly speaki...

Non-negative matrix deconvolution in noise robust speech recognition

by Antti Hurmalainen, Jort Gemmeke, Tuomas Virtanen - in ICASSP 2011
"... High noise robustness has been achieved in speech recognition by using sparse exemplar-based methods with spectrogram windows spanning up to 300 ms. A downside is that a large exemplar dictio-nary is required to cover sufficiently many spectral patterns and their temporal alignments within windows. ..."
Abstract - Cited by 9 (7 self) - Add to MetaCart
High noise robustness has been achieved in speech recognition by using sparse exemplar-based methods with spectrogram windows spanning up to 300 ms. A downside is that a large exemplar dictio-nary is required to cover sufficiently many spectral patterns and their temporal alignments within windows. We propose a recognition sys-tem based on a shift-invariant convolutive model, where exemplar activations at all the possible temporal positions jointly reconstruct an utterance. Recognition rates are evaluated using the AURORA-2 database, containing spoken digits with noise ranging from clean speech to-5 dB SNR. We obtain results superior to those, where the activations were found independently for each overlapping window. Index Terms — Automatic speech recognition, noise robust-ness, deconvolution, sparsity, exemplar-based
(Show Context)

Citation Context

... used successfully for sound source separation in music and speech applications [10, 11]. The entries of the activation matrix are initialised to unity values, and the following update rule (based on =-=[12]-=-) is applied iteratively: X = X⊗ P T t=1A T t · ←(t−1) [Yutt Ψutt ] Λ+ P T t=1A T t · ←(t−1) 1 , (3) where ⊗ is elementwise multiplication, and all divisions are also elementwise. Λ is a sparsity matr...

MUSICAL SOUND SEPARATION USING PITCH-BASED LABELING AND BINARY TIME-FREQUENCY MASKING

by Yipeng Li, Deliang Wang
"... Monaural musical sound separation attempts to segregate different instrument lines from single-channel polyphonic music. We propose a system that decomposes an input into timefrequency units using an auditory filterbank and utilizes pitch to label which instrument line each time-frequency unit is as ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Monaural musical sound separation attempts to segregate different instrument lines from single-channel polyphonic music. We propose a system that decomposes an input into timefrequency units using an auditory filterbank and utilizes pitch to label which instrument line each time-frequency unit is assigned to. The system is conceptually simple and computationally efficient. Systematic evaluation shows that, despite its simplicity, the proposed system achieves a competitive level of performance. Index Terms — musical sound separation, computational auditory scene analysis, pitch-based labeling 1.
(Show Context)

Citation Context

...y challenging problem. On the other hand, a system possessing such functionality allows more efficient audio coding, accurate content-based analysis, and sophisticated manipulation on musical signals =-=[1]-=-. In music multiple instruments often play simultaneously. The polyphonic nature of music creates unique problems for monaural musical sound separation. One such problem is overlapping harmonics where...

SPECTRAL COVARIANCE IN PRIOR DISTRIBUTIONS OF NON-NEGATIVE MATRIX FACTORIZATION BASED SPEECH SEPARATION

by Tuomas Virtanen
"... This paper proposes an algorithm for modeling the covariance of the spectrum in the prior distributions of non-negative matrix factorization (NMF) based sound source separation. Supervised NMF estimates a set of spectrum basis vectors for each source, and then represents a mixture signal using them. ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
This paper proposes an algorithm for modeling the covariance of the spectrum in the prior distributions of non-negative matrix factorization (NMF) based sound source separation. Supervised NMF estimates a set of spectrum basis vectors for each source, and then represents a mixture signal using them. When the exact characteristics of the sources are not known in advance, it is advantageous to train prior distributions of spectra instead of fixed spectra. Since the frequency bands in natural sound sources are strongly correlated, we model the distributions with full-covariance Gaussian distributions. Algorithms for training and applying the distributions are presented. The proposed methods produce better separation quality that the reference methods. Demonstration signals are available at www.cs.tut.fi/~tuomasv. 1.

PROBABILISTIC LATENT TENSOR FACTORIZATION FRAMEWORK FOR AUDIO MODELING

by Ali Taylan Cemgil, Yusuf Cem Sübakan
"... This paper introduces probabilistic latent tensor factorization (PLTF) as a general framework for hierarchical modeling of audio. This framework combines practical aspects of graphical modeling of machine learning with tensor factorization models. Once a model is constructed in the PLTF framework, t ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
This paper introduces probabilistic latent tensor factorization (PLTF) as a general framework for hierarchical modeling of audio. This framework combines practical aspects of graphical modeling of machine learning with tensor factorization models. Once a model is constructed in the PLTF framework, the estimation algorithm is immediately available. We illustrate our approach using several popular models such as NMF or NMF2D and provide extensions with simulation results on real data for key audio processing tasks such as restoration and source separation.
(Show Context)

Citation Context

...Table 2. A related model, proposed by [13], FitzGerald et al. is the Source-Filter Sinusoidal Shifted Nonnegative Tensor Factorization Model (SF-SSNTF). A model in the same spirit is also proposed in =-=[14]-=- by Klapuri et al. The model mimics physically inspired sourcefilter models of audio production in the spectral domain, such as a harmonic excitation multiplied by spectral envelope of a body response...

On the use of masking filters in sound source separation

by Derry Fitzgerald, Rajesh Jaiswal, Derry Fitzgerald, Rajesh Jaiswal - in Proc. of 15th International Conference on Digital Audio Effects , 2012
"... This Conference Paper is brought to you for free and open access by the ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
This Conference Paper is brought to you for free and open access by the
(Show Context)

Citation Context

...used in numerous sound source separation algorithms including [6], where it was used in the context of drum sound separation, user-assisted separation in [7], and for sourcefilter based separation in =-=[8]-=-. In effect, this approach allocates the energy in a given timefrequency bin across the sources according to a least-squares best fit. Another advantage of this approach is that the separated sources ...

Automatic Transcription of Pitch Content in Music and Selected Applications

by Matti Ryynänen
"... Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring notes in the piece. Automatic transcription of music refers to the extraction of such representations using signal-processing methods. This thesis concerns the automatic transcription of pitched notes in musical audio and its applications. Emphasis is laid on the transcription of realistic polyphonic music, where multiple pitched and percussive instruments are sounding simultaneously. The methods included in this thesis are based on a framework which combines both low-level acoustic modeling and high-level musicological modeling. The emphasis in the acoustic modeling has been set to note events so that the methods produce discrete-pitch notes with onset times and durations
(Show Context)

Citation Context

...e plugin version 2 scheduled for publication in the beginning of 2009. For details, see www.celemony.com. An example application for editing individual notes in polyphonic music was also presented in =-=[130]-=-. Object-based coding of musical audio aims at using high-level musical objects, such as notes, as a basis for audio compression. While MIDI is a highly structured and compact representation of musica...

Bayesian Statistical Methods for Audio and Music Processing

by A. Taylan Cemgil, Simon J. Godsill, Paul Peeling, Nick Whiteley , 2008
"... Bayesian statistical methods provide a formalism for arriving at solutions to various problems faced in audio processing. In real environments, acoustical conditions and sound sources are highly variable, yet audio signals often possess significant statistical structure. There is a great deal of pri ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Bayesian statistical methods provide a formalism for arriving at solutions to various problems faced in audio processing. In real environments, acoustical conditions and sound sources are highly variable, yet audio signals often possess significant statistical structure. There is a great deal of prior knowledge available about why this statistical structure is present. This includes knowledge of the physical mechanisms by which sounds are generated, the cognitive processes by which sounds are perceived and, in the context of music, the abstract mechanisms by which high-level sound structure is compiled. Bayesian hierarchical techniques provide a natural means for unification of these bodies of prior knowledge, allowing the formulation of highly-structured models for observed audio data and latent processes at various levels of abstraction. They also permit the inclusion of desirable modelling components such as change-point structures and model-order specifications. The resulting models exhibit complex statistical structure and in practice, highly adaptive and powerful computational techniques are needed to perform inference. In this chapter, we review some of the statistical models and associated inference methods developed recently for
(Show Context)

Citation Context

...nsion. The basic idea is representing a spectrogram by enforcing a factorisation as X ≈ TV where both T and V are matrices with positive entries (Smaragdis and Brown 2003; Abdallah and Plumbley 2006; =-=Virtanen 2006-=-; Kameoka 2007; Bertin, Badeau, and Richard 2007; Vincent, Bertin, and Badeau 2008). In music signal analysis, T can be interpreted as a codebook of templates, corresponding to spectral shapes of indi...

Non-negative source-filter dynamical system for speech enhancement

by U. Le Roux, J. Hershey, Jonathan Le Roux, John R. Hershey , 2014
"... Model-based speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, model- based methods have to focus on developi ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Model-based speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, model- based methods have to focus on developing good speech models, whose quality will be key to their performance. In this study, we propose a novel probabilistic model for speech enhancement which precisely models the speech by taking into account the underlying speech production process as well as its dynamics. The proposed model follows a source-filter approach where the excitation and filter parts are modeled as non-negative dynamical systems. We present convergence-guaranteed update rules for each latent factor. In order to assess performance, we evaluate our model on a challenging speech enhancement task where the speech is observed under non-stationary noises recorded in a car. We show that our model outperforms state-of-the-art methods in terms of objective measures.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University