Results 1 -
8 of
8
Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
- IEEE Trans. On Audio, Speech and Lang. Processing
, 2007
"... Abstract—An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain ..."
Abstract
-
Cited by 189 (30 self)
- Add to MetaCart
(Show Context)
Abstract—An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements. Index Terms—Acoustic signal analysis, audio source separation, blind source separation, music, nonnegative matrix factorization, sparse coding, unsupervised learning. I.
Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine
- In: Proc. EUSIPCO’2005. (2005
, 2005
"... This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on two-stage processing in which the input signal is first separated into elementary time-frequency components which are then organized into sound sources. Non-negat ..."
Abstract
-
Cited by 56 (4 self)
- Add to MetaCart
(Show Context)
This paper presents a procedure for the separation of pitched musical instruments and drums from polyphonic music. The method is based on two-stage processing in which the input signal is first separated into elementary time-frequency components which are then organized into sound sources. Non-negative matrix factorization (NMF) is used to separate the input spectrogram into components having a fixed spectrum with time-varying gain. Each component is classified either to pitched instruments or to drums using a support vector machine (SVM). The classifier is trained using example signals from both classes. Simulation experiments were carried out using mixtures generated from real-world polyphonic music signals. The results indicate that the proposed method enables better separation quality than existing methods based on sinusoidal modeling and onset detection. Demonstration signals are available at
Extraction of drum tracks from polyphonic music using Independent Subspace Analysis
- In Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003
, 2003
"... The analysis and separation of audio signals into their original components is an important prerequisite to automatic transcription of music, extraction of metadata from audio data, and speaker separation in video conferencing. In this paper, a method for the separation of drum tracks from polyphoni ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
(Show Context)
The analysis and separation of audio signals into their original components is an important prerequisite to automatic transcription of music, extraction of metadata from audio data, and speaker separation in video conferencing. In this paper, a method for the separation of drum tracks from polyphonic music is proposed. It consists of an Independent Component Analysis and a subsequent partitioning of the derived components into subspaces containing the percussive and harmonic sustained instruments. With the proposed method, different samples of popular music have been analyzed. The results show sufficient separation of drum tracks and non-drum tracks for subsequent metadata extraction. Informal listening tests prove a moderate audio quality of the resulting audio signals. 1.1. Motivation 1.
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
(Show Context)
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
The CUIDADO Project
- In ISMIR
, 2002
"... The CUIDADO Project (Content-based Unified Interfaces and Descriptors for Audio/music Databases available Online) aims at developing a new chain of applications through the use of audio/music content descriptors, in the spirit of the MPEG-7 standard. The project includes the design of appropriate de ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
The CUIDADO Project (Content-based Unified Interfaces and Descriptors for Audio/music Databases available Online) aims at developing a new chain of applications through the use of audio/music content descriptors, in the spirit of the MPEG-7 standard. The project includes the design of appropriate description structures, the development of extractors for deriving high-level information from audio signals, and the design and implementation of two applications: the Sound Palette and the Music Browser. These applications include new features, which systematically exploit high-level descriptors and provide users with content-based access to large catalogues of audio/music material. The Sound Palette focuses on audio samples and targets professional users, whereas the Music Browser addresses a broader user target through the management of Popular music titles. After a presentation of the project objectives and methodology, we describe the original features of the two applications based on the systematic use of descriptors and the technical architecture framework on which they rely.
HIGH-RESOLUTION SINUSOIDAL ANALYSIS FOR RESOLVING HARMONIC COLLISIONS IN MUSIC AUDIO SIGNAL PROCESSING BY
"... Many music signals can largely be considered an additive combination of multiple sources, such as musical instruments or voice. If the musical sources are pitched instruments, the spectra they produce are predominantly har-monic, and are thus well suited to an additive sinusoidal model. However, due ..."
Abstract
- Add to MetaCart
(Show Context)
Many music signals can largely be considered an additive combination of multiple sources, such as musical instruments or voice. If the musical sources are pitched instruments, the spectra they produce are predominantly har-monic, and are thus well suited to an additive sinusoidal model. However, due to resolution limits inherent in time-frequency analyses, when the har-monics of multiple sources occupy equivalent time-frequency regions, their individual properties are additively combined in the time-frequency repre-sentation of the mixed signal. Any such time-frequency point in a mixture where multiple harmonics overlap produces a single observation from which the contributions owed to each of the individual harmonics cannot be trivially deduced. These overlaps are referred to as overlapping partials or harmonic collisions. If one wishes to infer some information about individual sources in music mixtures, the information carried in regions where collided harmonics exist becomes unreliable due to interference from other sources. This inter-
unknown title
"... This article deals with relations between randomness and structure in audio and musical sounds. Randomness, in the casual sense, refers to something that has an element of variation or surprise in it, whereas structure refers to something more predictable, rule-based, or even deterministic. When dea ..."
Abstract
- Add to MetaCart
This article deals with relations between randomness and structure in audio and musical sounds. Randomness, in the casual sense, refers to something that has an element of variation or surprise in it, whereas structure refers to something more predictable, rule-based, or even deterministic. When dealing with noise, which is the “purest ” type of randomness, one usually adopts the canonical physical or engineering definition of noise as a signal with a white spectrum (i.e., composed of equal or almost-equal energies in all frequencies). This seems to imply that noise is a complex phenomenon simply because it contains many frequency components. (Mathematically, to qualify as random or stochastic process, the density of the frequency components must be such that the signal would have a continuous spectrum, whereas periodic components would be spectral lines or delta functions.) In contradiction to this reasoning stands the fact that, to our perception, noise is a rather simple signal, and in terms of its musical use, it does not allow much structural manipulation or organization. Musical notes or other repeating or periodic acoustic components in music are closer to being deterministic and could be considered as “structure.” However, complex musical signals, such as polyphonic or orchestral music that contain simultaneous contributions from multiple instrumental sources, often have a spectrum so dense that it seems to approach a noise-like spectrum. In such situations, the ability to determine the structure of the signal cannot be revealed by looking at signal spectrum alone. Therefore, the physical definition of noise as a signal with a smooth or approximately continuous spectrum seems to obscure other significant properties of signals versus noise, such as whether a given signal has temporal structure—in other words, whether the signal can be predicted. The article presents a novel approach to (automatic) analysis of music based on an “anticipation