Results 1 - 10
of
27
IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM
"... Recent research in music genre classification hints at a glass ceiling being reached using timbral audio features. To overcome this, the combination of multiple different feature sets bearing diverse characteristics is needed. We propose a new approach to extend the scope of the features: We transcr ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Recent research in music genre classification hints at a glass ceiling being reached using timbral audio features. To overcome this, the combination of multiple different feature sets bearing diverse characteristics is needed. We propose a new approach to extend the scope of the features: We transcribe audio data into a symbolic form using a transcription system, extract symbolic descriptors from that representation and combine them with audio features. With this method, we are able to surpass the glass ceiling and to further improve music genre classification, as shown in the experiments through three reference music databases and comparison to previously published performance results. 1
Sound Source Separation in Monaural Music Signals
, 2006
"... Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separat ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is
Simac: Semantic interaction with music audio contents
- Journal of Intelligent Information Systems (accepted
, 2005
"... ..."
Using one-class svms and wavelets for audio surveillance systems. submitted to IEEE trans. on Information Forensic and Security
"... This paper presents a procedure aimed at recognizing environmental sounds for surveillance and security applications. We propose to apply One-Class Support Vector Machines (1-SVMs) together with a sophisticated dissimilarity measure as a discriminative framework in order to address audio classificat ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper presents a procedure aimed at recognizing environmental sounds for surveillance and security applications. We propose to apply One-Class Support Vector Machines (1-SVMs) together with a sophisticated dissimilarity measure as a discriminative framework in order to address audio classification, and hence, sound recognition. We illustrate the performance of this method on an audio database, which consists of above 1,000 sounds belonging to 9 classes. Additionally, the use of a set of state-of-the-art audio features is studied. Additionally, we introduce a set of novel features obtained by combining elementary features. Experimental results are presented and show the superiority of this novel sound recognition method. We show that the 1-SVM clearly overperforms the conventional HMM-based system and we emphasize that the largest improvement is achieved when the system is fed by a set of features that comprises wavelet coefficients.
Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle
- IEEE Trans. Audio, Speech, Lang. Process
, 2010
"... Abstract—A new method for the estimation of multiple concurrent pitches in piano recordings is presented. It addresses the issue of overlapping overtones by modeling the spectral envelope of the overtones of each note with a smooth autoregressive model. For the background noise, a moving-average mod ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract—A new method for the estimation of multiple concurrent pitches in piano recordings is presented. It addresses the issue of overlapping overtones by modeling the spectral envelope of the overtones of each note with a smooth autoregressive model. For the background noise, a moving-average model is used and the combination of both tends to eliminate harmonic and sub-harmonic erroneous pitch estimations. This leads to a complete generative spectral model for simultaneous piano notes, which also explicitly includes the typical deviation from exact harmonicity in a piano overtone series. The pitch set which maximizes an approximate likelihood is selected from among a restricted number of possible pitch combinations as the one. Tests have been conducted on a large homemade database called MAPS, composed of piano recordings from a real upright piano and from high-quality samples. Index Terms—Acoustic signal analysis, audio processing, multipitch estimation, piano, transcription, spectral smoothness. I.
Normalized Cuts for predominant melodic source separation
- Proceedings of the International Conference on Music Information Retrieval
, 2007
"... Abstract—The predominant melodic source, frequently the singing voice, is an important component of musical signals. In this paper, we describe a method for extracting the predominant source and corresponding melody from “real-world ” polyphonic music. The proposed method is inspired by ideas from c ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—The predominant melodic source, frequently the singing voice, is an important component of musical signals. In this paper, we describe a method for extracting the predominant source and corresponding melody from “real-world ” polyphonic music. The proposed method is inspired by ideas from computational auditory scene analysis. We formulate predominant melodic source tracking and formation as a graph partitioning problem and solve it using the normalized cut which is a global criterion for segmenting graphs that has been used in computer vision. Sinusoidal modeling is used as the underlying representation. A novel harmonicity cue which we term harmonically wrapped peak similarity is introduced. Experimental results supporting the use of this cue are presented. In addition, we show results for automatic melody extraction using the proposed approach. Index Terms—Computational auditory scene analysis (CASA), music information retrieval (MIR), normalized cut, sinusoidal modeling, spectral clustering. I.
PERCEPTUALLY-BASED EVALUATION OF THE ERRORS USUALLY MADE WHEN AUTOMATICALLY TRANSCRIBING MUSIC
- ISMIR 2008 – SESSION 4C – AUTOMATIC MUSIC ANALYSIS AND TRANSCRIPTION
, 2008
"... This paper investigates the perceptual importance of typical errors occurring when transcribing polyphonic music excerpts into a symbolic form. The case of the automatic transcription of piano music is taken as the target application and two subjective tests are designed. The main test aims at under ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper investigates the perceptual importance of typical errors occurring when transcribing polyphonic music excerpts into a symbolic form. The case of the automatic transcription of piano music is taken as the target application and two subjective tests are designed. The main test aims at understanding how human subjects rank typical transcription errors such as note insertion, deletion or replacement, note doubling, incorrect note onset or duration, and so forth. The Bradley-Terry-Luce (BTL) analysis framework is used and the results show that pitch errors are more clearly perceived than incorrect loudness estimations or temporal deviations from the original recording. A second test presents a first attempt to include this information in more perceptually motivated measures for evaluating transcription systems.
AUTOMATIC INSTRUMENT RECOGNITION IN A POLYPHONIC MIXTURE USING SPARSE REPRESENTATIONS
"... In this paper, we introduce a method to address automatic instrument recognition in polyphonic music. It is based on the decomposition of the music signal with instrumentspecific harmonic atoms, yielding an approximate object representation of the signal. A post-processing is then applied to exhibit ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we introduce a method to address automatic instrument recognition in polyphonic music. It is based on the decomposition of the music signal with instrumentspecific harmonic atoms, yielding an approximate object representation of the signal. A post-processing is then applied to exhibit ensemble saliences that give clues about the number of instruments and their labels. The whole algorithm is then applied on artificial mixes of solo performances. The identification of the number of instrument reaches 73 % on 10-s segments and the fully blind problem of identification of the ensemble label without prior knowledge on the number of instruments is 17 %. 1
N o EFFICIENT BAYESIAN INFERENCE FOR HARMONIC MODELS VIA ADAPTIVE POSTERIOR FACTORIZATION
"... Efficient Bayesian inference for harmonic models via adaptive posterior factorization ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Efficient Bayesian inference for harmonic models via adaptive posterior factorization
AUTOMATIC IDENTIFICATION OF INSTRUMENT CLASSES IN POLYPHONIC AND POLY-INSTRUMENT AUDIO
"... We present and compare several models for automatic identification of instrument classes in polyphonic and poly-instrument audio. The goal is to be able to identify which categories of instrument (Strings, Woodwind, Guitar, Piano, etc.) are present in a given audio example. We use a machine learning ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present and compare several models for automatic identification of instrument classes in polyphonic and poly-instrument audio. The goal is to be able to identify which categories of instrument (Strings, Woodwind, Guitar, Piano, etc.) are present in a given audio example. We use a machine learning approach to solve this task. We constructed a system to generate a large database of musically relevant poly-instrument audio. Our database is generated from hundreds of instruments classified in 7 categories. Musical audio examples are generated by mixing multi-track MIDI files with thousands of instrument combinations. We compare three different classifiers: a Support Vector Machine (SVM), a Multilayer Perceptron (MLP) and a Deep Belief Network (DBN). We show that the DBN tends to outperform both the SVM and the MLP in most cases. tion, we generated our own database of audio. Our goal was to have enough variability in the set of instruments so as to allow us to generalize to instruments not used in the training set. An overview of our system is illustrated in Figure 1. 1.

