Results 1 - 10
of
39
Non-negative matrix factorization for polyphonic music transcription
- IN PROC. IEEE WORKSHOP APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS
, 2003
"... In this paper we present a methodology for analyzing polyphonic musical passages comprised by notes that exhibit a harmonically fixed spectral profile (such as piano notes). Taking advantage of this unique note structure we can model the audio content of the musical passage by a linear basis transfo ..."
Abstract
-
Cited by 240 (14 self)
- Add to MetaCart
(Show Context)
In this paper we present a methodology for analyzing polyphonic musical passages comprised by notes that exhibit a harmonically fixed spectral profile (such as piano notes). Taking advantage of this unique note structure we can model the audio content of the musical passage by a linear basis transform and use non-negative matrix decomposition methods to estimate the spectral profile and the temporal information of every note. This approach results in a very simple and compact system that is not knowledge based, but rather learns notes by observation.
Conditions for nonnegative independent component analysis
- IEEE Signal Processing Letters
, 2002
"... We consider the noiseless linear independent component analysis problem, in the case where the hidden sources s are non-negative. We assume that the random variables s i s are well-grounded in that they have a non-vanishing pdf in the (positive) neighbourhood of zero. For an orthonormal rotation y = ..."
Abstract
-
Cited by 96 (12 self)
- Add to MetaCart
(Show Context)
We consider the noiseless linear independent component analysis problem, in the case where the hidden sources s are non-negative. We assume that the random variables s i s are well-grounded in that they have a non-vanishing pdf in the (positive) neighbourhood of zero. For an orthonormal rotation y = Wx of pre-whitened observations x = QAs, under certain reasonable conditions we show that y is a permutation of the s (apart from a scaling factor) if and only if y is non-negative with probability 1. We suggest that this may enable the construction of practical learning algorithms, particularly for sparse non-negative sources.
A Generative Model for Music Transcription
, 2005
"... In this paper we present a graphical model for polyphonic music transcription. Our model, formulated as a Dynamical Bayesian Network, embodies a transparent and computationally tractable approach to this acoustic analysis problem. An advantage of our approach is that it places emphasis on explicitl ..."
Abstract
-
Cited by 68 (18 self)
- Add to MetaCart
(Show Context)
In this paper we present a graphical model for polyphonic music transcription. Our model, formulated as a Dynamical Bayesian Network, embodies a transparent and computationally tractable approach to this acoustic analysis problem. An advantage of our approach is that it places emphasis on explicitly modelling the sound generation procedure. It provides a clear framework in which both high level (cognitive) prior information on music structure can be coupled with low level (acoustic physical) information in a principled manner to perform the analysis. The model is a special case of the, generally intractable, switching Kalman filter model. Where possible, we derive, exact polynomial time inference procedures, and otherwise efficient approximations. We argue that our generative model based approach is computationally feasible for many music applications and is readily extensible to more general auditory scene analysis scenarios.
Blind signal decompositions for automatic transcription of polyphonic music
- NMF and K-SVD on the benchmark,” in IEEE
"... This paper investigates on the behavior of two blind signal decomposition algorithms, non negative matrix factorization (NMF) and non negative K-SVD (NKSVD), in a polyphonic music transcription task. State-of-the-art transcription systems are based on a frame-byframe, low-level approach; blind syste ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
(Show Context)
This paper investigates on the behavior of two blind signal decomposition algorithms, non negative matrix factorization (NMF) and non negative K-SVD (NKSVD), in a polyphonic music transcription task. State-of-the-art transcription systems are based on a frame-byframe, low-level approach; blind systems could be an alternative to them. Two raw but effective audio-to-MIDI systems are proposed and evaluated. Performances are similar, but in favor of NMF, which is more robust to initialization, choice of the order and computationnally less costly. Index Terms — Automatic transcription, polyphonic music, non negative matrix factorization, K-SVD.
Sparse Representations of Polyphonic Music
- SIGNAL PROCESSING
, 2005
"... We consider two approaches for sparse decomposition of polyphonic music: a timedomain approach based on shift-invariant waveforms, and a frequency-domain approach based on phase-invariant power spectra. When trained on an example of a MIDI-controlled acoustic piano recording, both methods produce di ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
We consider two approaches for sparse decomposition of polyphonic music: a timedomain approach based on shift-invariant waveforms, and a frequency-domain approach based on phase-invariant power spectra. When trained on an example of a MIDI-controlled acoustic piano recording, both methods produce dictionary vectors or sets of vectors which represent underlying notes, and produce component activations related to the original MIDI score. The time-domain method is more computationally expensive, but produces sample-accurate spike-like activations and can be used for a direct time-domain reconstruction. The spectral domain method discards phase information, but is faster than the time-domain method and retains more higher-frequency harmonics. These results suggest that these two methods would provide a powerful yet complementary approach to automatic music transcription or object-based coding of musical audio.
Monaural Sound Source Separation by Perceptually Weighted Non-Negative Matrix Factorization
"... Abstract — A data-adaptive algorithm for the separation of sound sources from one-channel signals is presented. The algorithm applies weighted non-negative matrix factorization on the power spectrogram of the input signal. Perceptually motivated weights for each critical band in each frame are used ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
Abstract — A data-adaptive algorithm for the separation of sound sources from one-channel signals is presented. The algorithm applies weighted non-negative matrix factorization on the power spectrogram of the input signal. Perceptually motivated weights for each critical band in each frame are used to model the loudness perception of the human auditory system. The method compresses high-energy components, and enables the estimation of perceptually significant low-energy characteristics of sources. The power spectrogram is factorized into a sum of components which have a fixed magnitude spectrum with a time-varying gain. Each source consists of one or more components. The parameters of the components are estimated by minimizing the weighted divergence between the observed power spectrogram and the model, for which a weighted non-negative matrix factorization algorithm is proposed. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and percussive sounds. The performance of the proposed method was compared with other separation algorithms which are based on the same signal model. These include for example independent subspace analysis and sparse coding. According to the simulations the proposed method enables perceptually better separation quality than the existing algorithms. Demonstration signals are available at
An Independent Component Analysis approach to Automatic Music Transcription
- in Proc. 114th AES Convention
, 2003
"... This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers
Polyphonic Music Retrieval: The N-gram Approach
, 2004
"... This Music Information Retrieval (MIR) study investigates the use of n-grams and textual In-formation Retrieval (IR) approaches for the retrieval and access of polyphonic music data. IR, synonymous with text IR, implies the task of retrieving documents or texts with information content that is relev ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
This Music Information Retrieval (MIR) study investigates the use of n-grams and textual In-formation Retrieval (IR) approaches for the retrieval and access of polyphonic music data. IR, synonymous with text IR, implies the task of retrieving documents or texts with information content that is relevant to a user’s information need. With music retrieval, the use of n-grams has largely been confined to monophonic musical sequences. The few studies that have investigated its use with polyphonic music collections typically reduce a polyphonic file into a monophonic sequence for n-gram construction. Tech-niques for full-music indexing of polyphonic music data with n-grams are investigated. A method to obtain n-grams from polyphonic music data is introduced. The information con-tent of ‘musical n-grams ’ is extended to include rhythmic information in addition to intervallic information. For this, ratios of onset times between two adjacent pairs of pitch events are used. To encode ‘musical n-grams ’ to obtain ‘musical words ’ for indexing, a function that maps interval classes to text characters is formulated, and ranges of ratio bins are defined. These encoding approaches enable encoding of the pitch and rhythm information at vari-
Separation of Musical Sources and Structure from SingleChannel Polyphonic Recordings University of
, 2006
"... The thesis deals principally with the separation of pitched sources from single-channel polyphonic musical recordings. The aim is to extract from a mixture a set of pitched instruments or sources, where each source contains a set of similarly sounding events or notes, and each note is seen as compri ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
The thesis deals principally with the separation of pitched sources from single-channel polyphonic musical recordings. The aim is to extract from a mixture a set of pitched instruments or sources, where each source contains a set of similarly sounding events or notes, and each note is seen as comprising partial, transient and noise content. The work also has implications for separating non-pitched or percussive sounds from recordings, and in general, for unsupervised clustering of a list of detected audio events in a recording into a meaningful set of source classes. The alignment of a symbolic score/MIDI representation with the recording constitutes a pre-processing stage. The three main areas of con-tribution are: firstly, the design of harmonic tracking algorithms and spectral-filtering techniques for removing harmonics from the mixture, where particular attention has been paid to the case of harmonics which are overlapping in fre-quency. Secondly, some studies will be presented for separating transient attacks from recordings, both when they are distinguishable from and when they are overlapping in time with other transients. This section also includes a method which proposes that the behaviours of the harmonic and noise components of a note are partially correlated. This is used to share the noise component of a mixture of pitched notes between the interfering sources. Thirdly, unsupervised clustering has been applied to the task of grouping a set of separated notes from the recording into sources, where notes belonging to the same source ide-ally have similar features or attributes. Issues relating to feature computation, feature selection, dimensionality and dependence on a symbolic music repre-sentation are explored. Applications of this work exist in audio spatialisation, audio restoration, music content description, effects processing and elsewhere.