Results 1 - 10
of
27
Automatic Transcription of Music
, 2001
"... A system for the automatic transcription of music is described. Signal processing methods are introduced that solve different facets of the overall problem. Main emphasis is laid on finding the multiple pitches of concurrent musical sounds. Sound onset detection and musical meter estimation are desc ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
A system for the automatic transcription of music is described. Signal processing methods are introduced that solve different facets of the overall problem. Main emphasis is laid on finding the multiple pitches of concurrent musical sounds. Sound onset detection and musical meter estimation are described to some extent. Other topics discussed are noise robustness, estimation of the number of concurrent voices, sound separation, and musical instrument recognition. The presented system is evaluated using a database of musical sounds, synthesized MIDI-songs, and CDrecordings. Also, the performance of the system is compared to that of human listeners. 1.
Separation of sound sources by convolutive sparse coding
- in Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, 2004. [Online] Available: http://journal.speech.cs.cmu.edu/SAPA2004
, 2004
"... An algorithm for the separation of sound sources is presented. Each source is parametrized as a convolution between a time-frequency magnitude spectrogam and an onset vector. The source model is able to represent several types of sounds, for example repetitive drum sounds and harmonic sounds with mo ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
An algorithm for the separation of sound sources is presented. Each source is parametrized as a convolution between a time-frequency magnitude spectrogam and an onset vector. The source model is able to represent several types of sounds, for example repetitive drum sounds and harmonic sounds with modulations. An iterative algorithm is proposed for the estimation the parameters. The algorithm is based on minimizing the reconstruction error and the number of onsets. The number of onsets is minimized by applying the sparse coding scheme for onset vectors. A way of modeling the loudness perception of the human auditory system is proposed. The method compresses high-energy sources, and enables the separation of lowenergy sources which are perceptually significant. The algorithm is able to separate meaningful sources from real-world signals. Simulation experiments were carried out using mixtures of harmonic instruments. Demonstration signals are available at
Survey of Sparse and Non-Sparse Methods in Source Separation
, 2005
"... Source separation arises in a variety of signal processing applications, ranging from speech processing to medical image analysis. The separation of a superposition of multiple signals is accomplished by taking into account the structure of the mixing process and by making assumptions about the sour ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Source separation arises in a variety of signal processing applications, ranging from speech processing to medical image analysis. The separation of a superposition of multiple signals is accomplished by taking into account the structure of the mixing process and by making assumptions about the sources. When the information about the mixing process and sources is limited, the problem is called ‘blind’. By assuming that the sources can be represented sparsely in a given basis, recent research has demonstrated that solutions to previously problematic blind source separation problems can be obtained. In some cases, solutions are possible to problems intractable by previous non-sparse methods. Indeed, sparse methods provide a powerful approach to the separation of linear mixtures of independent data. This paper surveys the recent arrival of sparse blind source separation methods and the previously existing non-sparse methods, providing insights and appropriate hooks into the literature along the way.
Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine
- In: Proc. EUSIPCO’2005. (2005
, 2005
"... This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on two-stage processing in which the input signal is first separated into elementary time-frequency components which are then organized into sound sources. Non-negat ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on two-stage processing in which the input signal is first separated into elementary time-frequency components which are then organized into sound sources. Non-negative matrix factorization (NMF) is used to separate the input spectrogram into components having a fixed spectrum with time-varying gain. Each component is classified either to pitched instruments or to drums using a support vector machine (SVM). The classifier is trained using example signals from both classes. Simulation experiments were carried out using mixtures generated from real-world polyphonic music signals. The results indicate that the proposed method enables better separation quality than existing methods based on sinusoidal modeling and onset detection. Demonstration signals are available at
Automatic music transcription as we know it today
- Journal of New Music Research
, 2004
"... The aim of this overview is to describe methods for the automatic transcription of Western polyphonic music. The transcription task is here understood as transforming an acoustic musical signal into a MIDI-like symbolic representation. Only pitched musical instruments are considered: recognizing the ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
The aim of this overview is to describe methods for the automatic transcription of Western polyphonic music. The transcription task is here understood as transforming an acoustic musical signal into a MIDI-like symbolic representation. Only pitched musical instruments are considered: recognizing the sounds of drum instruments is not discussed. The main emphasis is laid on estimating the multiple fundamental frequencies of several concurrent sounds. Various approaches to solve this problem are discussed, including methods that are based on modelling the human auditory periphery, methods that mimic the human auditory scene analysis function, signal model-based Bayesian inference methods, and data-adaptive methods. Another subproblem addressed is the rhythmic parsing of acoustic musical signals. From the transcription point of view, this amounts to the temporal segmentation of music signals at different time scales. The relationship between the two subproblems and the general structure of the transcription problem is discussed. 1.
Convolutive Speech Bases and their Application to Supervised Speech Separation
- the IEEE Trans. on Speech and Audio Processing
, 2007
"... In this paper we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the non-negative matrix factorization algorithm. Due to the non-negativity constraint this type of co ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
In this paper we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the non-negative matrix factorization algorithm. Due to the non-negativity constraint this type of coding is very well suited for intuitively and efficiently representing magnitude spectra. We present results that reveal the nature of these basis functions and we introduce their utility in separating monophonic mixtures of known speakers.
Instrument Identification in Solo and Ensemble Music using Independent Subspace Analysis
- Proc. ISMIR
, 2004
"... We investigate the use of Independent Subspace Analysis (ISA) for instrument identification in musical recordings. We represent short-term log-power spectra of possibly polyphonic music as weighted non-linear combinations of typical note spectra plus background noise. These typical note spectra are ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We investigate the use of Independent Subspace Analysis (ISA) for instrument identification in musical recordings. We represent short-term log-power spectra of possibly polyphonic music as weighted non-linear combinations of typical note spectra plus background noise. These typical note spectra are learnt either on databases containing isolated notes or on solo recordings from different instruments. We show that this model has some theoretical advantages over methods based on Gaussian Mixture Models (GMM) or on linear ISA. Preliminary experiments with five instruments and test excerpts taken from commercial CDs give promising results. The performance on clean solo excerpts is comparable with existing methods and shows limited degradation under reverberant conditions. Applied to a difficult duo excerpt, the model is also able to identify the right pair of instruments and to provide an approximate transcription of the notes played by each instrument. 1.
Drum Transcription with nonnegative spectrogram factorisation
- IN EUSIPCO
, 2005
"... This paper describes a novel method for the automatic transcription of drum sequences. The method is based on separating the target drum sounds from the input signal using non-negative matrix factorisation, and on detecting sound onsets from the separated signals. The separation algorithm factorises ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
This paper describes a novel method for the automatic transcription of drum sequences. The method is based on separating the target drum sounds from the input signal using non-negative matrix factorisation, and on detecting sound onsets from the separated signals. The separation algorithm factorises the spectrogram of the input signal into a sum of instrument spectrograms, each having a fixed spectrum and a time-varying gain. The spectra are calculated from a set of training signals, and the time-varying gains are estimated with an algorithm stemming from non-negative matrix factorisation. Onset times of the instruments are detected from the estimated time-varying gains. The system gave better results than two state-of-the-art methods in simulations with acoustic signals containing polyphonic drum sequences, and overall hit rate of 96 % was accomplished. Demonstrational signals are available at
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Sparse Representations of Polyphonic Music
- SIGNAL PROCESSING
, 2005
"... We consider two approaches for sparse decomposition of polyphonic music: a timedomain approach based on shift-invariant waveforms, and a frequency-domain approach based on phase-invariant power spectra. When trained on an example of a MIDI-controlled acoustic piano recording, both methods produce di ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
We consider two approaches for sparse decomposition of polyphonic music: a timedomain approach based on shift-invariant waveforms, and a frequency-domain approach based on phase-invariant power spectra. When trained on an example of a MIDI-controlled acoustic piano recording, both methods produce dictionary vectors or sets of vectors which represent underlying notes, and produce component activations related to the original MIDI score. The time-domain method is more computationally expensive, but produces sample-accurate spike-like activations and can be used for a direct time-domain reconstruction. The spectral domain method discards phase information, but is faster than the time-domain method and retains more higher-frequency harmonics. These results suggest that these two methods would provide a powerful yet complementary approach to automatic music transcription or object-based coding of musical audio.

