Results 11  20
of
189
Signal Processing for Music Analysis
, 2011
"... Music signal processing may appear to be the junior relation of the large and mature field of speech signal processing, not least because many techniques and representations originally developed for speech have been applied to music, often with good results. However, music signals possess specific ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
Music signal processing may appear to be the junior relation of the large and mature field of speech signal processing, not least because many techniques and representations originally developed for speech have been applied to music, often with good results. However, music signals possess specific acoustic and structural characteristics that distinguish them from spoken language or other nonmusical signals. This paper provides an overview of some signal analysis techniques that specifically address musical dimensions such as melody, harmony, rhythm, and timbre. We will examine how particular characteristics of music signals impact and determine these techniques, and we highlight a number of novel music analysis and retrieval tasks that such processing makes possible. Our goal is to demonstrate that, to be successful, music audio signal processing techniques must be informed by a deep and thorough insight into the nature of music itself.
Analysis of polyphonic audio using sourcefilter model and nonnegative matrix factorization
 in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop
, 2006
"... •Framework for (polyphonic) audio — linear signal model for magnitude spectrum xt(k): x̂t(k) = N∑ ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
•Framework for (polyphonic) audio — linear signal model for magnitude spectrum xt(k): x̂t(k) = N∑
Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation
"... Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modelingbased separation system that can effecti ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modelingbased separation system that can effectively resolve overlapping harmonics. Our strategy is based on the observations that harmonics of the same source have correlated amplitude envelopes and that the change in phase of a harmonic is related to the instrument’s pitch. We use these two observations in a least squares estimation framework for separation of overlapping harmonics. The system directly distributes mixture energy for harmonics that are unobstructed by other sources. Quantitative evaluation of the proposed system is shown when ground truth pitch information is available, when rough pitch estimates are provided in the form of a MIDI score, and finally, when a multipitch tracking algorithm is used. We also introduce a technique to improve the accuracy of rough pitch estimates. Results show that the proposed system significantly outperforms related monaural musical sound separation systems. Index Terms—Common amplitude modulation (CAM), musical sound separation, sinusoidal modeling, time–frequency masking, underdetermined sound separation. I.
Combining PitchBased Inference and NonNegative Spectrogram Factorization in Separating Vocals from Polyphonic Music
"... This paper proposes a novel algorithm for separating vocals from polyphonic music accompaniment. Based on pitch estimation, the method first creates a binary mask indicating timefrequency segments in the magnitude spectrogram where harmonic content of the vocal signal is present. Second, nonnegative ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
This paper proposes a novel algorithm for separating vocals from polyphonic music accompaniment. Based on pitch estimation, the method first creates a binary mask indicating timefrequency segments in the magnitude spectrogram where harmonic content of the vocal signal is present. Second, nonnegative matrix factorization (NMF) is applied on the nonvocal segments of the spectrogram in order to learn a model for the accompaniment. NMF predicts the amount of noise in the vocal segments, which allows separating vocals and noise even when they overlap in time and frequency. Simulations with commercial and synthesized acoustic material show an average improvement of 1.3 dB and 1.8 dB, respectively, in comparison with a reference algorithm based on sinusoidal modeling, and also the perceptual quality of the separated vocals is clearly improved. The method was also tested in aligning separated vocals and textual lyrics, where it produced better results than the reference method. Index Terms: sound source separation, nonnegative matrix factorization, unsupervised learning, pitch estimation
Mixtures of Gamma Priors for NonNegative Matrix Factorization Based Speech Separation
 in 8th International Conference on Independent Component Analysis and Signal Separation (ICA
, 2009
"... Abstract. This paper deals with audio source separation using supervised nonnegative matrix factorization (NMF). We propose a prior model based on mixtures of Gamma distributions for each sound class, which hyperparameters are trained given a training corpus. This formulation allows adapting the sp ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
Abstract. This paper deals with audio source separation using supervised nonnegative matrix factorization (NMF). We propose a prior model based on mixtures of Gamma distributions for each sound class, which hyperparameters are trained given a training corpus. This formulation allows adapting the spectral basis vectors of the sound sources during actual operation, when the exact characteristics of the sources are not known in advance. Simulations were conducted using a random mixture of two speakers. Even without adaptation the mixture model outperformed the basic NMF, and adaptation furher improved slightly the separation quality. Audio demonstrations are available at www.cs.tut.fi/~tuomasv. 1
Transcribing Multiinstrument Polyphonic Music with Hierarchical Eigeninstruments
 in Sig. Process
, 2011
"... Abstract—This paper presents a general probabilistic model for transcribing singlechannel music recordings containing multiple polyphonic instrument sources. The system requires no prior knowledge of the instruments present in the mixture (other than the number), although it can benefit from inform ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a general probabilistic model for transcribing singlechannel music recordings containing multiple polyphonic instrument sources. The system requires no prior knowledge of the instruments present in the mixture (other than the number), although it can benefit from information about instrument type if available. In contrast to many existing polyphonic transcription systems, our approach explicitly models the individual instruments and is thereby able to assign detected notes to their respective sources. We use training instruments to learn a set of linear manifolds in model parameter space which are then used during transcription to constrain the properties of models fit to the target mixture. This leads to a hierarchical mixtureofsubspaces design which makes it possible to supply the system with prior knowledge at different levels of abstraction. The proposed technique is evaluated on both recorded and synthesized mixtures containing two, three, four, and five instruments each. We compare our approach in terms of transcription with (i.e. detected pitches must be associated with the correct instrument) and without sourceassignment to another multiinstrument transcription system as well as a baseline NMF algorithm. For twoinstrument mixtures evaluated with sourceassignment, we obtain average framelevel Fmeasures of up to 0.52 in the completely blind transcription setting (i.e. no prior knowledge of the instruments in the mixture) and up to 0.67 if we assume knowledge of the basic instrument types. For transcription without source assignment, these numbers rise to 0.76 and 0.83, respectively. Index Terms—Music, polyphonic transcription, NMF, subspace, eigeninstruments
S.: Convergenceguaranteed multiplicative algorithms for nonnegative matrix factorization with βdivergence
 In: Proc. MLSP (2010
"... This paper presents a new multiplicative algorithm for nonnegative matrix factorization with βdivergence. The derived update rules have a similar form to those of the conventional multiplicative algorithm, only differing through the presence of an exponent term depending on β. The convergence is th ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
(Show Context)
This paper presents a new multiplicative algorithm for nonnegative matrix factorization with βdivergence. The derived update rules have a similar form to those of the conventional multiplicative algorithm, only differing through the presence of an exponent term depending on β. The convergence is theoretically proven for any realvalued β based on the auxiliary function method. The convergence speed is experimentally investigated in comparison with previous works. 1.
Unsupervised singlechannel music source separation by average harmonic structure modeling
 IEEE Trans. Audio Speech Language Process
"... ..."
(Show Context)
Discovering speech phones using convolutive nonnegative matrix factorisation with a sparseness constraint
 Neurocomputing
, 2008
"... Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Nonnegative Matrix Factorisation (NMF), a method for finding partsbased representations of nonnegati ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Nonnegative Matrix Factorisation (NMF), a method for finding partsbased representations of nonnegative data. Here, we present an extension to convolutive NMF that includes a sparseness constraint, where the resultant algorithm has multiplicative updates and utilises the beta divergence as its reconstruction objective. In combination with a spectral magnitude transform of speech, this method discovers auditory objects that resemble speech phones along with their associated sparse activation patterns. We use these in a supervised separation scheme for monophonic mixtures, finding improved separation performance in comparison to classic convolutive NMF. Keywords: Nonnegative matrix factorisation; Sparse representations; Convolutive
Nonnegative Matrix Factorization with MarkovChained Bases for Modeling TimeVarying Patterns in Music Spectrograms
"... Abstract. This paper presents a new sparse representation for polyphonic music signals. The goal is to learn the timevarying spectral patterns of musical instruments, such as attack of the piano or vibrato of the violin in polyphonic music signals without any prior information. We model the spectro ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a new sparse representation for polyphonic music signals. The goal is to learn the timevarying spectral patterns of musical instruments, such as attack of the piano or vibrato of the violin in polyphonic music signals without any prior information. We model the spectrogram of music signals under the assumption that they are composed of a limited number of components which are composed of Markovchained spectral patterns. The proposed model is an extension of nonnegative matrix factorization (NMF). An efficient algorithm is derived based on the auxiliary function method.