Results 1 - 10
of
16
Instrument Identification in Polyphonic Music: Feature Weighting to Minimize Influence of Sound Overlaps
, 2007
"... We provide a new solution to the problem of feature variations caused by the overlapping of sounds in instrument identification in polyphonic music. When multiple instruments simultaneously play, partials (harmonic components) of their sounds overlap and interfere, which makes the acoustic features ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We provide a new solution to the problem of feature variations caused by the overlapping of sounds in instrument identification in polyphonic music. When multiple instruments simultaneously play, partials (harmonic components) of their sounds overlap and interfere, which makes the acoustic features different from those of monophonic sounds. To cope with this, we weight features based on how much they are affected by overlapping. First, we quantitatively evaluate the influence of overlapping on each feature as the ratio of the within-class variance to the between-class variance in the distribution of training data obtained from polyphonic sounds. Then, we generate feature axes using a weighted mixture that minimizes the influence via linear discriminant analysis. In addition, we improve instrument identification using musical context. Experimental results showed that the recognition rates using both feature weighting and musical context were 84.1 % for duo, 77.6 % for trio, and 72.3 % for quartet; those without using either were 53.4, 49.6, and 46.5%, respectively.
Separating voices in polyphonic music: A contig mapping approach
- In Computer Music Modeling and Retrieval: Second International Symposium
, 2004
"... Abstract. Voice separation is a critical component of music information retrieval, music analysis and automated transcription systems. We present a contig mapping approach to voice separation based on perceptual principles. The algorithm runs in O(n 2) time, uses only pitch height and event boundari ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract. Voice separation is a critical component of music information retrieval, music analysis and automated transcription systems. We present a contig mapping approach to voice separation based on perceptual principles. The algorithm runs in O(n 2) time, uses only pitch height and event boundaries, and requires no user-defined parameters. The method segments a piece into contigs according to voice count, then reconnects fragments in adjacent contigs using a shortest distance strategy. The order of connection is by distance from maximal voice contigs, where the voice ordering is known. This contig-mapping algorithm has been implemented in VoSA, a Java-based voice separation analyzer software. The algorithm performed well when applied to J. S. Bach’s Twoand Three-Part Inventions and the forty-eight Fugues from the Well-Tempered Clavier. We report an overall average fragment consistency of 99.75%, correct fragment connection rate of 94.50 % and average voice consistency of 88.98%, metrics which we propose to measure voice separation performance. 1
E.: A statistical approach to retrieval under userdependent uncertainty in query-by-humming systems
- in Query-by-Humming Systems,” in 2004 Multimedia Information Retrieval (ACM-MIR04
, 2004
"... Robustly addressing uncertainty in query formulation and search is one of the most challenging problems in multimedia information retrieval (MIR) systems. In this paper, a statistical approach to the problem of retrieval under the effect of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Robustly addressing uncertainty in query formulation and search is one of the most challenging problems in multimedia information retrieval (MIR) systems. In this paper, a statistical approach to the problem of retrieval under the effect of
Horizontal and Vertical Integration/Segregation in Auditory Streaming: A Voice Separation Algorithm for Symbolic Musical Data
- In proceedings of the confernce Sound and Music Computing (SMC07), Lefkada
, 2007
"... Abstract — Listeners are thought to be capable of perceiving multiple voices in music. Adopting a perceptual view of musical ‘voice ’ that corresponds to the notion of auditory stream, a computational model is developed that splits a musical score (symbolic musical data) into different voices. A sin ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — Listeners are thought to be capable of perceiving multiple voices in music. Adopting a perceptual view of musical ‘voice ’ that corresponds to the notion of auditory stream, a computational model is developed that splits a musical score (symbolic musical data) into different voices. A single ‘voice ’ may consist of more than one synchronous notes that are perceived as belonging to the same auditory stream; in this sense, the proposed algorithm, may separate a given musical work into fewer voices than the maximum number of notes in the greatest chord (e.g. a piece consisting of four or more concurrent notes may be separated simply into melody and accompaniment). This is paramount, not only in the study of auditory streaming per se, but also for developing MIR systems that enable pattern recognition and extraction within musically pertinent ‘voices ’ (e.g. melodic lines). The algorithm is tested qualitatively and quantitatively against a small dataset that acts as groundtruth. I.
ABSTRACT ‘Voice ’ separation: theoretical, perceptual and
"... computational perspectives The notions of ‘voice’, as well as, homophony and polyphony, are thought to be well understood by musicians. Listeners are thought to be capable of perceiving multiple ‘voices ’ in music. However, there exists no systematic theory that describes how ‘voices’ can be identif ..."
Abstract
- Add to MetaCart
computational perspectives The notions of ‘voice’, as well as, homophony and polyphony, are thought to be well understood by musicians. Listeners are thought to be capable of perceiving multiple ‘voices ’ in music. However, there exists no systematic theory that describes how ‘voices’ can be identified, especially, when polyphonic and homophonic elements are mixed together. The paper presents different views of what ‘voice ’ means and how the problem of voice separation can be described systematically, with a view to understanding the problem better and developing a systematic perceptually-based description of the cognitive task of segregating ‘voices’ in music. Vague (or even contradicting) treatments of this issue will be presented. Elements of a systematic theory that can be implemented as a computer program are also proposed. WHAT IS A VOICE? It appears that the term ‘voice ’ has different meanings for different research fields (traditional musicology, music cognition, computational musicology). Recently, there have been a number of attempts (e.g. Temperley, 2001;
VISA: THE VOICE INTEGRATION/SEGREGATION ALGORITHM
"... Listeners are capable to perceive multiple voices in music. Adopting a perceptual view of musical ‘voice’ that corresponds to the notion of auditory stream, a computational model is developed that splits musical scores (symbolic musical data) into different voices. A single ‘voice ’ may consist of m ..."
Abstract
- Add to MetaCart
Listeners are capable to perceive multiple voices in music. Adopting a perceptual view of musical ‘voice’ that corresponds to the notion of auditory stream, a computational model is developed that splits musical scores (symbolic musical data) into different voices. A single ‘voice ’ may consist of more than one synchronous notes that are perceived as belonging to the same auditory stream; in this sense, the proposed algorithm, may separate a given musical work into fewer voices than the maximum number of notes in the greatest chord. This is paramount, among other, for developing MIR systems that enable pattern recognition and extraction within musically pertinent ‘voices ’ (e.g. melodic lines). The algorithm is tested against a small dataset that acts as groundtruth. 1.
MUSICAL VOICE INTEGRATION/SEGREGATION: VISA REVISITED
"... The Voice Integration/Segregation Algorithm (VISA) proposed by Karydis et al. [7] splits musical scores (symbolic musical data) into different voices, based on a perceptual view of musical voice that corresponds to the notion of auditory stream. A single ‘voice ’ may consist of more than one synchro ..."
Abstract
- Add to MetaCart
The Voice Integration/Segregation Algorithm (VISA) proposed by Karydis et al. [7] splits musical scores (symbolic musical data) into different voices, based on a perceptual view of musical voice that corresponds to the notion of auditory stream. A single ‘voice ’ may consist of more than one synchronous notes that are perceived as belonging to the same auditory stream. The algorithm was initially tested against a handful of musical works that were carefully selected so as to contain a steady number of streams (contrapuntal voices or melody with accompaniment). The initial algorithm was successful on this small dataset, but was proven to run into serious problems in cases were the number of streams/voices changed during the course of a musical work. A new version of the algorithm has been developed that attempts to solve this problem; the new version, additionally, includes an improved mechanism for context-dependent breaking of chords and for keeping streams homogeneous. The new algorithm performs equally well on the old dataset, but gives much better results on the new larger and more diverse dataset. 1.
Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation
"... Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modeling-based separation system that can effecti ..."
Abstract
- Add to MetaCart
Abstract—Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time–frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modeling-based separation system that can effectively resolve overlapping harmonics. Our strategy is based on the observations that harmonics of the same source have correlated amplitude envelopes and that the change in phase of a harmonic is related to the instrument’s pitch. We use these two observations in a least squares estimation framework for separation of overlapping harmonics. The system directly distributes mixture energy for harmonics that are unobstructed by other sources. Quantitative evaluation of the proposed system is shown when ground truth pitch information is available, when rough pitch estimates are provided in the form of a MIDI score, and finally, when a multipitch tracking algorithm is used. We also introduce a technique to improve the accuracy of rough pitch estimates. Results show that the proposed system significantly outperforms related monaural musical sound separation systems. Index Terms—Common amplitude modulation (CAM), musical sound separation, sinusoidal modeling, time–frequency masking, underdetermined sound separation. I.
mdhoffma at cs·princeton·edu
"... prc at cs·princeton·edu We present two simple perceptually motivated audio effects designed to increase the perceived sensory dissonance/roughness (a process we call “dissonancization”) of audio input. The first involves heterodyning multiple bands of the audio signal at different frequencies to bre ..."
Abstract
- Add to MetaCart
prc at cs·princeton·edu We present two simple perceptually motivated audio effects designed to increase the perceived sensory dissonance/roughness (a process we call “dissonancization”) of audio input. The first involves heterodyning multiple bands of the audio signal at different frequencies to break each sinusoid in each band into two sinusoids separated in frequency by the amount that Kameoka and Kuriyagawa [1] predict will produce a maximally dissonant effect. The second attempts to increase the depth of modulation caused by existing beating partials by exponentiating the amplitude envelope within small bands, enhancing the perceived roughness already present in the signal. The first algorithm can produce very dramatic effects even for very consonant inputs, whereas the second tends to produce a more subtle effect. Both algorithms are quite simple to understand and implement and computationally inexpensive enough to be used in real time, but produce perceptually interesting results. The effects can be selectively applied so as to affect only desired frequency ranges, and can be continuously controlled (e.g. in a performance context) to have more or less impact. 1.

