Results 1 - 10
of
36
Analysis of polyphonic audio using source-filter model and non-negative matrix factorization
- in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop
, 2006
"... •Framework for (polyphonic) audio — linear signal model for magnitude spectrum xt(k): x̂t(k) = N∑ ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
•Framework for (polyphonic) audio — linear signal model for magnitude spectrum xt(k): x̂t(k) = N∑
Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation
"... Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modeling-based separation system that can effecti ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modeling-based separation system that can effectively resolve overlapping harmonics. Our strategy is based on the observations that harmonics of the same source have correlated amplitude envelopes and that the change in phase of a harmonic is related to the instrument’s pitch. We use these two observations in a least squares estimation framework for separation of overlapping harmonics. The system directly distributes mixture energy for harmonics that are unobstructed by other sources. Quantitative evaluation of the proposed system is shown when ground truth pitch information is available, when rough pitch estimates are provided in the form of a MIDI score, and finally, when a multipitch tracking algorithm is used. We also introduce a technique to improve the accuracy of rough pitch estimates. Results show that the proposed system significantly outperforms related monaural musical sound separation systems. Index Terms—Common amplitude modulation (CAM), musical sound separation, sinusoidal modeling, time–frequency masking, underdetermined sound separation. I.
Non-negative matrix deconvolution in noise robust speech recognition
- in ICASSP 2011
"... High noise robustness has been achieved in speech recognition by using sparse exemplar-based methods with spectrogram windows spanning up to 300 ms. A downside is that a large exemplar dictio-nary is required to cover sufficiently many spectral patterns and their temporal alignments within windows. ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
(Show Context)
High noise robustness has been achieved in speech recognition by using sparse exemplar-based methods with spectrogram windows spanning up to 300 ms. A downside is that a large exemplar dictio-nary is required to cover sufficiently many spectral patterns and their temporal alignments within windows. We propose a recognition sys-tem based on a shift-invariant convolutive model, where exemplar activations at all the possible temporal positions jointly reconstruct an utterance. Recognition rates are evaluated using the AURORA-2 database, containing spoken digits with noise ranging from clean speech to-5 dB SNR. We obtain results superior to those, where the activations were found independently for each overlapping window. Index Terms — Automatic speech recognition, noise robust-ness, deconvolution, sparsity, exemplar-based
MUSICAL SOUND SEPARATION USING PITCH-BASED LABELING AND BINARY TIME-FREQUENCY MASKING
"... Monaural musical sound separation attempts to segregate different instrument lines from single-channel polyphonic music. We propose a system that decomposes an input into timefrequency units using an auditory filterbank and utilizes pitch to label which instrument line each time-frequency unit is as ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Monaural musical sound separation attempts to segregate different instrument lines from single-channel polyphonic music. We propose a system that decomposes an input into timefrequency units using an auditory filterbank and utilizes pitch to label which instrument line each time-frequency unit is assigned to. The system is conceptually simple and computationally efficient. Systematic evaluation shows that, despite its simplicity, the proposed system achieves a competitive level of performance. Index Terms — musical sound separation, computational auditory scene analysis, pitch-based labeling 1.
SPECTRAL COVARIANCE IN PRIOR DISTRIBUTIONS OF NON-NEGATIVE MATRIX FACTORIZATION BASED SPEECH SEPARATION
"... This paper proposes an algorithm for modeling the covariance of the spectrum in the prior distributions of non-negative matrix factorization (NMF) based sound source separation. Supervised NMF estimates a set of spectrum basis vectors for each source, and then represents a mixture signal using them. ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper proposes an algorithm for modeling the covariance of the spectrum in the prior distributions of non-negative matrix factorization (NMF) based sound source separation. Supervised NMF estimates a set of spectrum basis vectors for each source, and then represents a mixture signal using them. When the exact characteristics of the sources are not known in advance, it is advantageous to train prior distributions of spectra instead of fixed spectra. Since the frequency bands in natural sound sources are strongly correlated, we model the distributions with full-covariance Gaussian distributions. Algorithms for training and applying the distributions are presented. The proposed methods produce better separation quality that the reference methods. Demonstration signals are available at www.cs.tut.fi/~tuomasv. 1.
PROBABILISTIC LATENT TENSOR FACTORIZATION FRAMEWORK FOR AUDIO MODELING
"... This paper introduces probabilistic latent tensor factorization (PLTF) as a general framework for hierarchical modeling of audio. This framework combines practical aspects of graphical modeling of machine learning with tensor factorization models. Once a model is constructed in the PLTF framework, t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
This paper introduces probabilistic latent tensor factorization (PLTF) as a general framework for hierarchical modeling of audio. This framework combines practical aspects of graphical modeling of machine learning with tensor factorization models. Once a model is constructed in the PLTF framework, the estimation algorithm is immediately available. We illustrate our approach using several popular models such as NMF or NMF2D and provide extensions with simulation results on real data for key audio processing tasks such as restoration and source separation.
On the use of masking filters in sound source separation
- in Proc. of 15th International Conference on Digital Audio Effects
, 2012
"... This Conference Paper is brought to you for free and open access by the ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
This Conference Paper is brought to you for free and open access by the
Automatic Transcription of Pitch Content in Music and Selected Applications
"... Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring notes in the piece. Automatic transcription of music refers to the extraction of such representations using signal-processing methods. This thesis concerns the automatic transcription of pitched notes in musical audio and its applications. Emphasis is laid on the transcription of realistic polyphonic music, where multiple pitched and percussive instruments are sounding simultaneously. The methods included in this thesis are based on a framework which combines both low-level acoustic modeling and high-level musicological modeling. The emphasis in the acoustic modeling has been set to note events so that the methods produce discrete-pitch notes with onset times and durations
Bayesian Statistical Methods for Audio and Music Processing
, 2008
"... Bayesian statistical methods provide a formalism for arriving at solutions to various problems faced in audio processing. In real environments, acoustical conditions and sound sources are highly variable, yet audio signals often possess significant statistical structure. There is a great deal of pri ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Bayesian statistical methods provide a formalism for arriving at solutions to various problems faced in audio processing. In real environments, acoustical conditions and sound sources are highly variable, yet audio signals often possess significant statistical structure. There is a great deal of prior knowledge available about why this statistical structure is present. This includes knowledge of the physical mechanisms by which sounds are generated, the cognitive processes by which sounds are perceived and, in the context of music, the abstract mechanisms by which high-level sound structure is compiled. Bayesian hierarchical techniques provide a natural means for unification of these bodies of prior knowledge, allowing the formulation of highly-structured models for observed audio data and latent processes at various levels of abstraction. They also permit the inclusion of desirable modelling components such as change-point structures and model-order specifications. The resulting models exhibit complex statistical structure and in practice, highly adaptive and powerful computational techniques are needed to perform inference. In this chapter, we review some of the statistical models and associated inference methods developed recently for
Non-negative source-filter dynamical system for speech enhancement
, 2014
"... Model-based speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, model- based methods have to focus on developi ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Model-based speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, model- based methods have to focus on developing good speech models, whose quality will be key to their performance. In this study, we propose a novel probabilistic model for speech enhancement which precisely models the speech by taking into account the underlying speech production process as well as its dynamics. The proposed model follows a source-filter approach where the excitation and filter parts are modeled as non-negative dynamical systems. We present convergence-guaranteed update rules for each latent factor. In order to assess performance, we evaluate our model on a challenging speech enhancement task where the speech is observed under non-stationary noises recorded in a car. We show that our model outperforms state-of-the-art methods in terms of objective measures.