Results 1 - 10
of
52
Automatic Musical Genre Classification Of Audio Signals
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2002
"... ... describe music. They are commonly used to structure the increasing amounts of music available in digital form on the Web and are important for music information retrieval. Genre categorization for audio has traditionally been performed manually. A particular musical genre is characterized by sta ..."
Abstract
-
Cited by 422 (22 self)
- Add to MetaCart
... describe music. They are commonly used to structure the increasing amounts of music available in digital form on the Web and are important for music information retrieval. Genre categorization for audio has traditionally been performed manually. A particular musical genre is characterized by statistical properties related to the instrumentation, rhythmic structure and form of its members. In this work, algorithms for the automatic genre categorization of audio signals are described. More specifically, we propose a set of features for representing texture and instrumentation. In addition a novel set of features for representing rhythmic structure and strength is proposed. The performance of those feature sets has been evaluated by training statistical pattern recognition classifiers using real world audio collections. Based on the automatic hierarchical genre classification two graphical user interfaces for browsing and interacting with large audio collections have been developed.
A Multi-Pitch Tracking Algorithm for Noisy Speech
- IEEE Transactions on Speech and Audio Processing
, 2002
"... We present a robust algorithm for multi-pitch tracking of noisy speech. Our approach integrates an improved channel and peak selection method, a new integration method for extracting periodicity information across different frequency channels, and a hidden Markov model (HMM) for forming continuous p ..."
Abstract
-
Cited by 49 (11 self)
- Add to MetaCart
We present a robust algorithm for multi-pitch tracking of noisy speech. Our approach integrates an improved channel and peak selection method, a new integration method for extracting periodicity information across different frequency channels, and a hidden Markov model (HMM) for forming continuous pitch tracks, and as a result, our algorithm can reliably track single and double pitch tracks in a noisy environment. The proposed algorithm is evaluated on a database of speech utterances mixed with various interferences and the results show that our algorithm outperforms existing algorithms significantly.
Multiple Fundamental Frequency Estimation Based on Harmonicity and Spectral Smoothness
, 2003
"... A new method for estimating the fundamental frequencies of concurrent musical sounds is described. The method is based on an iterative approach, where the fundamental frequency of the most prominent sound is estimated, the sound is subtracted from the mixture, and the process is repeated for the res ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
A new method for estimating the fundamental frequencies of concurrent musical sounds is described. The method is based on an iterative approach, where the fundamental frequency of the most prominent sound is estimated, the sound is subtracted from the mixture, and the process is repeated for the residual signal. For the estimation stage, an algorithm is proposed which utilizes the frequency relationships of simultaneous spectral components, without assuming ideal harmonicity. For the subtraction stage, the spectral smoothness principle is proposed as an efficient new mechanism in estimating the spectral envelopes of detected sounds. With these techniques, multiple fundamental frequency estimation can be performed quite accurately in a single time frame, without the use of long-term temporal features. The experimental data comprised recorded samples of 30 musical instruments from four different sources. Multiple fundamental frequency estimation was performed for random sound source and pitch combinations. Error rates for mixtures ranging from one to six simultaneous sounds were 1.8%, 3.9%, 6.3%, 9.9%, 14%, and 18%, respectively. In musical interval and chord identification tasks, the algorithm outperformed the average of ten trained musicians. The method works robustly in noise, and is able to handle sounds that exhibit inharmonicities. The inharmonicity factor and spectral envelope of each sound is estimated along with the fundamental frequency.
Melody Matching Directly from Audio
- Indiana University
, 2001
"... In this paper we explore a technique for content-based music retrieval using a continuous pitch contour derived from a recording of the audio query instead of a quantization of the query into discrete notes. Our system determines the pitch for each unit of time in the query and then uses a time-warp ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
In this paper we explore a technique for content-based music retrieval using a continuous pitch contour derived from a recording of the audio query instead of a quantization of the query into discrete notes. Our system determines the pitch for each unit of time in the query and then uses a time-warping algorithm to match this string of pitches against songs in a database of MIDI files. This technique, while much slower at matching, is usually far more accurate than techniques based on discrete notes. It would be an ideal technique to use to provide the final ranking of candidate results produced by a faster but lest robust matching algorithm. 1.
Multiple fundamental frequency estimation by summing harmonic amplitudes
- in ISMIR
, 2006
"... This paper proposes a conceptually simple and computationally efficient fundamental frequency (F0) estimator for polyphonic music signals. The studied class of estimators calculate the salience, or strength, of a F0 candidate as a weighted sum of the amplitudes of its harmonic partials. A mapping fr ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
This paper proposes a conceptually simple and computationally efficient fundamental frequency (F0) estimator for polyphonic music signals. The studied class of estimators calculate the salience, or strength, of a F0 candidate as a weighted sum of the amplitudes of its harmonic partials. A mapping from the Fourier spectrum to a “F0 salience spectrum” is found by optimization using generated training material. Based on the resulting function, three different estimators are proposed: a “direct ” method, an iterative estimation and cancellation method, and a method that estimates multiple F0s jointly. The latter two performed as well as a considerably more complex reference method. The number
A Comparison of Melodic Database Retrieval Techniques Using Sung Queries
, 2002
"... Query-by-humming systems search a database of music for good matches to a sung, hummed, or whistled melody. Errors in transcription and variations in pitch and tempo can cause substantial mismatch between queries and targets. Thus, algorithms for measuring melodic similarity in query-by-humming syst ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
Query-by-humming systems search a database of music for good matches to a sung, hummed, or whistled melody. Errors in transcription and variations in pitch and tempo can cause substantial mismatch between queries and targets. Thus, algorithms for measuring melodic similarity in query-by-humming systems should be robust. We compare several variations of search algorithms in an effort to improve search precision. In particular, we describe a new frame-based algorithm that significantly outperforms note-by-note algorithms in tests using sung queries and a database of MIDI-encoded music.
Automatic Transcription of Music
, 2001
"... A system for the automatic transcription of music is described. Signal processing methods are introduced that solve different facets of the overall problem. Main emphasis is laid on finding the multiple pitches of concurrent musical sounds. Sound onset detection and musical meter estimation are desc ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
A system for the automatic transcription of music is described. Signal processing methods are introduced that solve different facets of the overall problem. Main emphasis is laid on finding the multiple pitches of concurrent musical sounds. Sound onset detection and musical meter estimation are described to some extent. Other topics discussed are noise robustness, estimation of the number of concurrent voices, sound separation, and musical instrument recognition. The presented system is evaluated using a database of musical sounds, synthesized MIDI-songs, and CDrecordings. Also, the performance of the system is compared to that of human listeners. 1.
Name that tune: A pilot study in finding a melody from a sung query
- Journal of the American Society for Information Science and Technology
, 2004
"... We have created a system for music search and retrieval. A user sings a theme from the desired piece of music. The sung theme (query) is converted into a sequence of pitch-intervals and rhythms. This sequence is compared to musical themes (targets) stored in a database. The top pieces are returned t ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
We have created a system for music search and retrieval. A user sings a theme from the desired piece of music. The sung theme (query) is converted into a sequence of pitch-intervals and rhythms. This sequence is compared to musical themes (targets) stored in a database. The top pieces are returned to the user in order of similarity to the sung theme. We describe, in detail, two different approaches to measuring similarity between database themes and the sung query. In the first, queries are compared to database themes using standard string-alignment algorithms. Here, similarity between target and query is determined by edit cost. In the second approach, pieces in the database are represented as hidden Markov models (HMMs). In this approach, the query is treated as an observation sequence and a target is judged similar to the query if its HMM has a high likelihood of generating the query. In this article we report our approach to the construction of a target database of themes, encoding, and transcription of user queries, and the results of preliminary experimentation with a set of sung queries. Our experiments show that while no approach is clearly superior to the other system, string matching has a slight advantage. Moreover, neither approach surpasses human performance.
Pitch histograms in audio and symbolic music information retrieval
- Proceedings of the Third International Conference on Music Information Retrieval: ISMIR
, 2002
"... In order to represent musical content, pitch and timing information is utilized in the majority of existing work in Symbolic Music Information Retrieval (MIR). Symbolic representations such as MIDI allow the easy calculation of such information and its manipulation. In contrast, most of the existing ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
In order to represent musical content, pitch and timing information is utilized in the majority of existing work in Symbolic Music Information Retrieval (MIR). Symbolic representations such as MIDI allow the easy calculation of such information and its manipulation. In contrast, most of the existing work in Audio MIR uses timbral and beat information, which can be calculated using automatic computer audition techniques. In this paper, Pitch Histograms are defined and proposed as a way to represent the pitch content of music signals both in symbolic and audio form. This representation is evaluated in the context of automatic musical genre classification. A multiple-pitch detection algorithm for polyphonic signals is used to calculate Pitch Histograms for audio signals. In order to evaluate the extent and significance of errors resulting from the automatic multiple-pitch detection, automatic musical genre classification results from symbolic and audio data are compared. The comparison indicates that Pitch Histograms provide valuable information for musical genre classification. The results obtained for both symbolic and audio cases indicate that although pitch errors degrade classification performance for the audio case, Pitch Histograms can be effectively used for classification in both cases. 1.

