Results 1 - 10
of
158
Instrument recognition in polyphonic music based on automatic taxonomies
- IEEE Transactions on Speech and Audio Processing
, 2006
"... We propose a new approach to instrument recognition in the context of real music orchestrations ranging from solos to quartets. The strength of our approach is that it does not require prior musical source separation. Thanks to a hierarchical clustering algorithm exploiting robust probabilistic dist ..."
Abstract
-
Cited by 63 (9 self)
- Add to MetaCart
(Show Context)
We propose a new approach to instrument recognition in the context of real music orchestrations ranging from solos to quartets. The strength of our approach is that it does not require prior musical source separation. Thanks to a hierarchical clustering algorithm exploiting robust probabilistic distances, we obtain a taxonomy of musical ensembles which is used to efficiently classify possible combinations of instruments played simultaneously. Moreover, a wide set of acoustic features is studied including some new proposals. In particular, Signal to Mask Ratios are found to be useful features for audio classification. This study focuses on a single music genre (i.e. jazz) but combines a variety of instruments among which are percussion and singing voice. Using a varied database of sound excerpts from commercial recordings, we show that the segmentation of music with respect to the instruments played can be achieved with an average accuracy of 53%.
Hybrid representations for audiophonic signal encoding
- Signal Processing
, 2002
"... Abstract. We discuss in this paper a new approach for signal models in the context of audio signal encoding. The method is based upon hybrid models featuring transient, tonal and stochastic compo-nents in the signal. Contrary to several existing approaches, our method does not rely on any prior segm ..."
Abstract
-
Cited by 63 (18 self)
- Add to MetaCart
(Show Context)
Abstract. We discuss in this paper a new approach for signal models in the context of audio signal encoding. The method is based upon hybrid models featuring transient, tonal and stochastic compo-nents in the signal. Contrary to several existing approaches, our method does not rely on any prior segmentation of the signal. The three components are estimated and encoded using a strategy very much in the spirit of transform coding. While the details of the method described here are taylored to audio signals, the general strategy should also apply to other types of signals exhibiting significantly different features, for example images. 1.
Towards Autonomous Agents for Live Computer Music: Realtime Machine Listening and Interactive Music Systems
, 2006
"... ..."
Union of MDCT bases for audio coding
- IEEE Trans. on Audio, Speech and Lang. Proc
, 2008
"... Abstract—This paper investigates the use of sparse overcomplete decompositions for audio coding. Audio signals are decomposed over a redundant union of modified discrete cosine transform (MDCT) bases having eight different scales. This approach produces a sparser decomposition than the traditional M ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
(Show Context)
Abstract—This paper investigates the use of sparse overcomplete decompositions for audio coding. Audio signals are decomposed over a redundant union of modified discrete cosine transform (MDCT) bases having eight different scales. This approach produces a sparser decomposition than the traditional MDCT-based orthogonal transform and allows better coding efficiency at low bitrates. Contrary to state-of-the-art low bitrate coders, which are based on pure parametric or hybrid representations, our approach is able to provide transparency. Moreover, we use a bitplane encoding approach, which provides a fine-grain scalable coder that can seamlessly operate from very low bitrates up to transparency. Objective evaluation, as well as listening tests, show that the performance of our coder is significantly better than a state-of-the-art transform coder at very low bitrates and has similar performance at high bitrates. We provide a link to test soundfiles and source code to allow better evaluation and reproducibility of the results. Index Terms—Audio coding, matching pursuit, scalable coding, signal representations, sparse representations.
On Finding Melodic Lines in Audio Recordings
, 2004
"... The paper presents our approach to the problem of finding melodic line(s) in polyphonic audio recordings. The approach is composed of two different stages, partially rooted in psychoacoustic theories of music perception: the first stage is dedicated to finding regions with strong and stable pitch (m ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
The paper presents our approach to the problem of finding melodic line(s) in polyphonic audio recordings. The approach is composed of two different stages, partially rooted in psychoacoustic theories of music perception: the first stage is dedicated to finding regions with strong and stable pitch (melodic fragments), while in the second stage, these fragments are grouped according to their properties (pitch, loudness...) into clusters which represent melodic lines of the piece. Expectation Maximization algorithm is used in both stages to find the dominant pitch in a region, and to train Gaussian Mixture Models that group fragments into melodies. The paper presents the entire process in more detail and provides some initial results.
Scalable Perceptual Mixing and Filtering of Audio Signals using an Augmented Spectral Representation
- in 8th Int. Conference on Digital Audio Effects
, 2005
"... Many interactive applications, such as video games, require processing a large number of sound signals in real-time. This paper proposes a novel perceptually-based and scalable approach for efficiently filtering and mixing a large number of audio signals. Key to its efficiency is a pre-computed Four ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
(Show Context)
Many interactive applications, such as video games, require processing a large number of sound signals in real-time. This paper proposes a novel perceptually-based and scalable approach for efficiently filtering and mixing a large number of audio signals. Key to its efficiency is a pre-computed Fourier frequency-domain representation augmented with additional descriptors. The descriptors can be used during the real-time processing to estimate which signals are not going to contribute to the final mixture. Besides, we also propose an importance sampling strategy allowing to tune the processing load relative to the quality of the output. We demonstrate our approach for a variety of applications including equalization and mixing, reverberation processing and spatialization. It can also be used to optimize audio data streaming or decompression. By reducing the number of operations and limiting bus traffic, our approach yields a 3 to 15-fold improvement in overall processing rate compared to brute-force techniques, with minimal degradation of the output. 1.
Tone-evoked excitatory and inhibitory synaptic conductances of primary auditory cortex neurons
- J. Neurophysiol
"... You might find this additional info useful... This article cites 63 articles, 32 of which you can access for free at: ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
(Show Context)
You might find this additional info useful... This article cites 63 articles, 32 of which you can access for free at:
A singer identification technique for content-based classification of MP3 music objects
- In Proceedings of International Conference on Information and Knowledge Management
, 2002
"... As there is a growing amount of MP3 music data available on the Internet today, the problems related to music classification and content-based music retrieval are getting more attention recently. In this paper, we propose an approach to automatically classify MP3 music objects according to their sin ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
As there is a growing amount of MP3 music data available on the Internet today, the problems related to music classification and content-based music retrieval are getting more attention recently. In this paper, we propose an approach to automatically classify MP3 music objects according to their singers. First, the coefficients extracted from the output of the polyphase filters are used to compute the MP3 features for segmentation. Based on these features, an MP3 music object can be decomposed into a sequence of notes (or phonemes). Then for each MP3 phoneme in the training set, its MP3 feature is extracted and used to train an MP3 classifier which can identify the singer of an unknown MP3 music object. Experiments are performed and analyzed to show the effectiveness of the proposed method.
Progressive perceptual audio rendering of complex scenes
- Proceedings of ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D2007
, 2007
"... Figure 1: Left: A scene with 1815 mobile sound sources. Audio is rendered in realtime with our progressive lossy processing technique using 15 % of the frequency coefficients and with an average of 12 clusters for 3D audio processing. Degradations compared to the reference solution are minimal. Righ ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Figure 1: Left: A scene with 1815 mobile sound sources. Audio is rendered in realtime with our progressive lossy processing technique using 15 % of the frequency coefficients and with an average of 12 clusters for 3D audio processing. Degradations compared to the reference solution are minimal. Right: a scene with 1004 mobile sound sources, running with 25 % of the frequency coefficients and 12 clusters. Despite recent advances, including sound source clustering and per-ceptual auditory masking, high quality rendering of complex virtual scenes with thousands of sound sources remains a challenge. Two major bottlenecks appear as the scene complexity increases: the cost of clustering itself, and the cost of pre-mixing source signals within each cluster. In this paper, we first propose an improved hierarchical clustering algorithm that remains efficient for large numbers of sources and clusters while providing progressive refinement capabilities. We then present a lossy pre-mixing method based on a progressive rep-resentation of the input audio signals and the perceptual importance of each sound source. Our quality evaluation user tests indicate that the recently introduced audio saliency map is inappropriate for this task. Consequently we propose a “pinnacle”, loudness-based met-ric, which gives the best results for a variety of target computing budgets. We also performed a perceptual pilot study which indi-cates that in audio-visual environments, it is better to allocate more clusters to visible sound sources. We propose a new clustering met-ric using this result. As a result of these three solutions, our sys-tem can provide high quality rendering of thousands of 3D-sound sources on a “gamer-style ” PC.
Gaussian Mixture Models for Extraction of Melodic Lines from Audio Recordings
, 2004
"... The presented study deals with extraction of melodic line(s) from polyphonic audio recordings. We base our work on the use of expectation maximization algorithm, which is employed in a two-step procedure that finds melodic lines in audio signals. In the first step, EM is used to find regions in the ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
The presented study deals with extraction of melodic line(s) from polyphonic audio recordings. We base our work on the use of expectation maximization algorithm, which is employed in a two-step procedure that finds melodic lines in audio signals. In the first step, EM is used to find regions in the signal with strong and stable pitch (melodic fragments). In the second step, these fragments are grouped into clusters according to their properties (pitch, loudness...). The obtained clusters represent distinct melodic lines. Gaussian Mixture Models, trained with EM are used for clustering. The paper presents the entire process in more detail and gives some initial results.